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o reword 



Education in this country has evolved dramatically from the days of one teacher in a one-room 
schoolhouse. Today, student learning is no longer confined to a physical space. Computers and the 
Internet have broken through school walls, giving students greater opportunities to personalize their 
education, access distant resources, receive extra help or more-challenging assignments, and engage 
in learning in new and unique ways. 

Although online learning is a relatively new enterprise in the K-12 arena, it is expanding rapidly, 
with increasing numbers of providers offering services and more students choosing to participate. As 
with any education program, online learning initiatives must be held accountable for results. Thus, 
it is critical for students and their parents — as well as administrators, policymakers, and funders — to 
have data informing them about program and student outcomes and, if relevant, about how well a 
particular program compares to traditional education models. To this end, rigorous evaluations are 
essential. They can identify whether programs and online resources are performing as promised, and, 
equally important, they can point to areas for improvement. 

The evaluations highlighted in this guide represent a broad spectrum of online options, from pro- 
grams that provide online courses to Web sites that feature education resources. The evaluations 
themselves range from internal assessments to external, scientific research studies. All demonstrate 
how program leaders and evaluators have been able to implement strong evaluation practices despite 
some challenges inherent to examining learning in an online environment. 

This guide complements another publication, Connecting Students to Advanced Courses Online, pub- 
lished last year by the U.S. Department of Education. Both are part of the Innovations in Education 
series, which identifies examples of innovative practices from across the country that are helping 
students achieve. 

My hope is that this guide will assist evaluators and program leaders who seek to use data to guide 
program improvement aimed at achieving positive outcomes for our nation’s students. 



Margaret Spellings, Secretary 
U.S. Department of Education 
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ntroduction 



This guide is designed as a resource for leaders and evaluators ofK-12 online learning programs. In 
this guide, the term “online learning” is used to refer to a range of education programs and resources 
in the K-12 arena, including distance learning courses offered by universities, private providers, or 
teachers at other schools; stand-alone “virtual schools ” that provide students with a full array of online 
courses and services; and educational Web sites that offer teachers, parents, and students a range of 
resources } The guide features seven evaluations that represent variety in both the type of program or 
resource being evaluated, and in the type of evaluation. These evaluations were selected because they 
offer useful lessons to others who are planning to evaluate an online learning program or resource. 



Of course, evaluating online learning is not alto- 
gether different from assessing any other type of 
education program, and, to some degree, eval- 
uators may face the same kind of design and 
analysis issues in both instances. Still, online 
program evaluators may encounter some unan- 
ticipated challenges in the virtual arena owing, 
for example, to the distance between program 
sites and students, participants’ unfamiliarity 
with the technology being used, and a lack of 
relevant evaluation tools. This guide examines a 
range of challenges that online program evalu- 
ators are likely to meet, some that are unique 
to online settings and others that are more gen- 
eral. It also describes how online environments 
can sometimes offer advantages to evaluators 
by presenting opportunities for streamlined data 
collection and analysis, for example. 

The guide specifically focuses on challenges 
and response strategies. All of the evaluations 
described here illustrate strong assessment prac- 
tices and robust findings, and they are models 
for demonstrating how program leaders and 
evaluators can handle the challenges of evaluat- 
ing online learning. 



Common Challenges of Evaluating 
Online Learning 

Online learning is a relatively new development 
in K-12 education but is rapidly expanding 
in both number of programs and participants. 
According to a report by the North American 
Council for Online Learning (NACOL), “As of 
September 2007, 42 states have [had] signifi- 
cant supplemental online learning programs (in 
which students enrolled in physical schools take 
one or two courses online), or significant full- 
time programs (in which students take most or 
all of their courses online), or both.” 2 In addition, 
the Internet houses an ever-expanding number 
of Web sites with a broad range of education 
resources for students, parents, and teachers. 
Given this expansion and a dearth of existing 
research on the topic, it is critical to conduct 
rigorous evaluations of online learning in K-12 
settings to ensure that it does what people hope 
it will do: help improve student learning. 

However, those undertaking such evaluations 
may well encounter a number of technical and 
methodological issues that can make this type 
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of research difficult to execute. For example, the 
scant research literature on K-12 online learning 
evaluation provides few existing frameworks to 
help evaluators describe and analyze programs, 
or tools, such as surveys or rubrics, they can 
use to collect data or assess program quality. 
Another common challenge when students are 
studying online is the difficulty of examining 
what is happening in multiple, geographically 
distant learning sites. And multifaceted educa- 
tion resources — such as vast Web sites offering 
a wide range of features or virtual schools that 
offer courses from multiple vendors — are also 
hard to evaluate, as are programs that utilize 
technologies and instructional models that are 
new to users. 

Furthermore, evaluations of online learning of- 
ten occur in the context of a politically load- 
ed debate about whether such programs are 
worth the investment and how much funding 
is needed to run a high-quality program; about 
whether online learning really provides students 
with high-quality learning opportunities; and 
about how to compare online and traditional 
approaches. Understandably, funders and pol- 
icymakers — not to mention students and their 
parents — want data that show whether online 
learning can be as effective as traditional educa- 
tional approaches and which online models are 
the best. These stakeholders may or may not 
think about evaluation in technical terms, but all 
of them are interested in how students perform 
in these new programs. At the same time, many 
online program leaders have multiple goals in 
mind, such as increased student engagement or 
increased student access to high-quality courses 
and teachers. They argue that test scores alone 
are an inadequate measure for capturing impor- 
tant differences between traditional and online 



learning settings. And, like educators in any 
setting — traditional or online — they may feel a 
natural trepidation about inviting evaluators to 
take a critical look at their program, fearing that 
it will hamper the progress of their program, 
rather than strengthen it. 

This guide will discuss how evaluators have 
tried to compare traditional and online learning 
approaches, what challenges they have encoun- 
tered, and what lessons they have learned. 

The Featured Evaluations 

This guide intentionally features a variety of on- 
line programs and resources, including virtual 
schools, programs that provide courses online, 
and Web sites with broad educational resources. 
Some serve an entire state, while others serve a 
particular district. This guide also includes dis- 
tinct kinds of evaluations, from internally led 
formative evaluations (see Glossary of Common 
Evaluation Terms, p. 65) to scientific research 
studies by external experts. In some cases, pro- 
gram insiders initiated the evaluations; in others, 
there were external reasons for the evaluation. 
The featured evaluations include a wide range 
of data collection and analysis activities — from 
formative evaluations that rely primarily on sur- 
vey, interview, and observation data, to scientific 
experiments that compare outcomes between 
online and traditional settings. In each instance, 
evaluators chose the methods carefully, based 
on the purpose of the evaluation and the specific 
set of research questions they sought to answer. 

The goal in choosing a range of evaluations for 
this guide was to offer examples that could be 
instructive to program leaders and evaluators in 
diverse circumstances, including those in vary- 
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ing stages of maturity, with varying degrees of 
internal capacity and amounts of available fund- 
ing. The featured evaluations are not without 
flaws, but they all illustrate reasonable strategies 
for tackling common challenges of evaluating 
online learning. 

To select the featured evaluations, researchers 
for this guide compiled an initial list of can- 
didates by searching for K-f2 online learning 
evaluations on the Web and in published doc- 
uments, then expanded the list through refer- 
rals from a six-member advisory group (see list 
of members in the Acknowledgments section, 
p. vii) and other knowledgeable experts in the 
field. Forty organizations were on the final list 
for consideration. 

A matrix of selection criteria was drafted and 
revised based on feedback from the advisory 
group. The three quality criteria were: 

• The evaluation considered multiple outcome 
measures, including student achievement. 

• The evaluation findings were widely commu- 
nicated to key stakeholders of the program or 
resource being studied. 

• Program leaders acted on evaluation results. 

Researchers awarded sites up to three points on 
each of these three criteria, using publicly avail- 
able information, review of evaluation reports, 
and gap-filling interviews with program leaders. 
All the included sites scored at least six of the 
possible nine points across these three criteria. 

Since a goal of this publication was to showcase 
a variety of types of evaluations, the potential 
sites were coded as to such additional character- 
istics as: internal vs. external evaluator, type of 
evaluation design, type of online learning pro- 



gram or resource, whether the program serves 
a district- or state-level audience, and stage of 
maturity. In selecting the featured evaluations, 
the researchers drew from as wide a range of 
characteristics as possible while keeping the 
quality criteria high. A full description of the 
methodology used to study the evaluation(s) of 
the selected sites can be found in appendix B: 
Research Methodology. 

The final selection included evaluations of the 
following online programs and resources: Ala- 
bama Connecting Classrooms, Educators, & Stu- 
dents Statewide Distance Learning, operated by 
the Alabama Department of Education; Algebra 
I Online, operated by the Louisiana Department 
of Education; Appleton eSchool, operated by 
Wisconsin’s Appleton Area School District; Ari- 
zona Virtual Academy, a public charter school; 
Chicago Public Schools’ Virtual High School; 
Digital Learning Commons in Washington state; 
and Thinkport, a Web site operated by Maryland 
Public Television and Johns Hopkins Center for 
Technology in Education. Additional informa- 
tion about each program and its evaluation(s) is 
included in table 1. 

What's in This Guide 

This guide was developed as a resource for eval- 
uators, whether external, third-party researchers, 
or program administrators and other staff who 
are considering conducting their own internal 
evaluation. Some of the evaluations highlighted 
here were carried out by external evaluators, 
while others were conducted by program staff 
or the staff of a parent organization. In all cases, 
the research was undertaken by experienced 
professionals, and this publication is aimed 
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Table 1 . Selected Variables of Profiled Online Learning Evaluations 



Name of Program 
or Resource 


Type of Program or Resource/ 
Year Initiated 


Type of Evaluation 
Featured in Guide 0 


Year 

Evaluation 

Started 


Evaluation 

Objective 


Alabama Connect- 
ing Classrooms, 
Educators, & 
Students Statewide 
Distance Learning 


Online courses and interactive 
video-conference classes for 
students across state / Piloted in 
spring 2006; statewide imple- 
mentation in fall 2006 


External; formative & 
summative; includes com- 
parisons with traditional 
instructional settings 


2006 


Monitoring program 
implementation; program 
improvement; sharing best 
practices 


Algebra 1 Online 
Louisiana 


Online algebra courses for stu- 
dents across the state / 2002 


External and internal; 
formative and summative; 
includes comparisons with 
traditional instructional 
settings 


2003; com- 
parative study 
in 2004-05 


Determine if program is ef- 
fective way to provide stu- 
dents with certified algebra 
teachers and to support the 
in-class teacher's certifica- 
tion efforts 


Appleton eSchool 


Online courses for students en- 
rolled in district's high schools 
(some students take all courses 
online) / 2002 


internal; formative; evalua- 
tion process based on inter- 
nally developed rubric 


Rubric piloted 
in 2006 


Program improvement; 
sharing best practices 


Arizona 

Virtual 

Academy 


Virtual charter school for stu- 
dents enrolled in public schools 
and homeschool students (no 
more than 20%) / 2003 


Formative and summative; 
external and internal 


2003 


State monitoring; quality 
assurance; program im- 
provement 


Chicago Public 
Schools - Virtual 
High School 


Online courses for students en- 
rolled in district's high schools / 
2002 


External; formative 


2002 


Assess need for mentor 
training and other student 
supports; identify ways to 
improve completion and 
pass rates 


Digital Learning 
Commons 


Web site with online courses 
and a wide array of resources for 
teachers and students / 2003 


External and internal; for- 
mative 


2003 


Understand usage of site; 
assess impact on student 
achievement and college 
readiness 


Thinkport — 
Maryland Public 
Television with 
John Hopkins 


Web site with a wide array of 
resources for teachers and stu- 
dents / 2000 


External and internal; 
formative and summative, 
including randomized con- 
trolled trial 


2001; ran- 
domized 
controlled trial 
in 2005 


Understand usage of site; 
assess impact of "elec- 
tronic field trip" on student 
performance 



a See Glossary of Common Evaluation Terms on page 65. 

b Run by the nonprofit College Board, the Advanced Placement (AP) program offers college-level course work to high school students. Many institutions of higher education 
offer college credits to students who take AP courses. 
c North Central Regional Educational Laboratory. 




Cost of Evaluation 


Funding Source for 
Evaluation 


Data 

Collected 


Data 

Collection Tools 


Improvements Resulting 
From Evaluation 


$60,000 in 2007; 
$600,000 in 2008 


Specific allocation 
in program budget 
(originating from 
state of Alabama) 


Student enrollment, comple- 
tion, grades; AP b course 
pass rates,- student and 
teacher satisfaction; de- 
scription of implementation 
and challenges 


Surveys, interviews, 
observations 


Teacher professional devel- 
opment; improvements to 
technology and administrative 
operations 


$11 0,000 for the 
most labor-intensive 
phase, including the 
comparative analysis 
during 2004-05 


General program 
funds, grants from 
NCREL, C 

BellSouth Founda- 
tion, and U.S. De- 
partment 
of Education 


Student grades and state 
test scores; pre- and post- 
tests; student use and satis- 
faction data; focus groups; 
teacher characteristics 
and teachers' certification 
outcomes 


Pre- and posttests 
developed by eval- 
uator, surveys 


Teacher professional develop- 
ment; increased role for in-class 
teachers, curriculum improve- 
ments, new technologies used; 
expansion to middle schools 


No specific allocation; 
Approx. $15,000 to 
make the rubric and 
evaluation process 
available in Web 
format 


General program 
funds (originating 
from charter grant 
from state of Wis- 
consin) 


internal descriptions and as- 
sessments of key program 
components (using rubric); 
mentor, student, and teacher 
satisfaction data; course 
completion and grades 


internally developed 
rubric and surveys 


Mentor professional develop- 
ment; course content improve- 
ments; expanded interactivity in 
courses; improvements to pro- 
gram Web site and printed mate- 
rials, sharing of best practices 


No specific allocation 


General program 
funds (originating 
from state of Ari- 
zona) 


Student enrollment, grades, 
and state test scores; par- 
ent, teacher, and student 
satisfaction data; internal 
& external assessments on 
key program components 


Electronic surveys; 
externally devel- 
oped rubric 


Wide range of operational and 
instructional improvements 


Approx. $25,000 


District's Office of 
Technology Services 


Student enrollment, course 
completion, grades, and 
test scores; student use and 
satisfaction data; mentor 
assessments of needs 


Surveys, interviews, 
focus groups 


Designated class periods for 
online learning; more onsite 
mentors; training for mentors 


Approx. $80,000 for 
the college-readiness 
study 


Bill & Melinda Gates 
Foundation 


Student transcripts; student 
grades and completion 
rates; use and satisfaction 
data 


Surveys 


Improvements to student ori- 
entation; curriculum improve- 
ments; development of school 
use plans to encourage best 
practices 


Estimated $40,000 
for the randomized 
controlled trial (part 
of a comprehensive 
evaluation) 


Star Schools Grant 


Student test scores on 
custom-developed content 
assessment; information 
about delivery of curriculum; 
use and satisfaction data 


Test of content knowl- 
edge developed by 
evaluator, surveys, 
teacher implementa- 
tion logs 


Changes to teaching materials; 
changes to online content and 
format 
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primarily at readers who are familiar with basic 
evaluation practices. For readers who are less 
familiar with evaluation, a glossary of common 
terms is available on page 65. 

Part I of this guide focuses on some of the likely 
challenges faced by online program evaluators, 
and it is organized into the following sections: 

• Meeting the Needs of Multiple Stakeholders 

• Building on the Existing Base of Knowledge 

• Evaluating Multifaceted Online Resources 

• Finding Appropriate Comparison Groups 

• Solving Data Collection Problems 

• Interpreting the Impact of Program Maturity 

• Translating Evaluation Findings Into Action 



Each section of Part I presents practical infor- 
mation about one of the challenges of evaluat- 
ing online learning and provides examples of 
how the featured evaluations have addressed 
it. Part II synthesizes the lessons learned from 
meeting those challenges and offers recom- 
mendations based as well on research and 
conversations with experts in evaluating online 
learning. These are geared to program leaders 
who are considering an evaluation and to assist 
them and their evaluators as they work togeth- 
er to design and complete the process. Brief 
profiles of each of the seven online programs 
can be found at the end of the guide, and de- 
tails about each evaluation are summarized in 
table 1. 
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Meeting the Needs of Multiple 
Stakeholders 

Every good evaluation begins with a clearly stat- 
ed purpose and a specific set of questions to be 
answered. These questions drive the evaluation 
approach and help determine the specific data 
collection techniques that evaluators will use. 
Sometimes, however, program evaluators find 
themselves in the difficult position of needing to 
fulfill several purposes at once, or needing to an- 
swer a wide variety of research questions. Most 
stakeholders have the same basic question — Is 
it working? But not everyone defines working 
in the same way. While policymakers may be 
interested in gains in standardized test scores, 
program leaders may be equally interested in 
other indicators of success, such as whether the 
program is addressing the needs of tradition- 
ally underrepresented subgroups, or producing 
outcomes that are only indirectly related to test 
scores, like student engagement. 

Naturally, these questions will not be answered 
in the same way. For example, if stakeholders 
want concrete evidence about the impact on 
student achievement, evaluators might conduct 
a randomized controlled trial — the gold stan- 
dard for assessing program effects — or a quasi- 
experimental design that compares test scores of 
program participants with students in matched 



comparison groups. But if stakeholders want to 
know, for example, how a program has been 
implemented across many sites, or why it is 
leading to particular outcomes, then they might 
opt for a descriptive study, incorporating such 
techniques as surveys, focus groups, or observa- 
tions of program participants to gather qualita- 
tive process data. 

When multiple stakeholders have differing in- 
terests and questions, how can evaluators meet 
these various expectations? 

To satisfy the demands of multiple stakehold- 
ers, evaluators often combine formative and 
summative components (see Glossary of Com- 
mon Evaluation Terms, p. 65 ). In the case of 
Alabama Connecting Classrooms, Educators, & 
Students Statewide Distance Learning (ACCESS), 
described below, the evaluators have been very 
proactive in designing a series of evaluations 
that, collectively, yield information that has util- 
ity both for program improvement and for un- 
derstanding program performance. In the case 
of Arizona Virtual Academy (AZVA), the school’s 
leadership team has made the most of the many 
evaluation activities they are required to com- 
plete by using findings from those activities 
for their own improvement purposes and pig- 
gybacking on them with data collection efforts 
of their own. In each instance, program leaders 
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would likely say that their evaluations are first 
and foremost intended to improve their pro- 
grams. Yet, when called upon to show achieve- 
ment results, they can do that as well. 

Combine Formative and Summative Evaluation 
Approaches to Meet Multiple Demands 

From the beginning of their involvement with 
ACCESS, evaluators from the International Society 
for Technology in Education (ISTE) have taken 
a combined summative and formative approach 
to studying this state-run program that offers 
both Web-based and interactive videoconferenc- 
ing courses. The original development proposal 
for ACCESS included an accountability plan that 
called for ongoing monitoring of the program 
to identify areas for improvement and to gener- 
ate useful information that could be shared with 
other schools throughout the state. In addition, 
from the beginning, program leaders and state 
policymakers expressed interest in gathering data 
about the program’s impact on student learning. 
To accomplish these multiple goals, ISTE com- 
pleted two successive evaluations for Alabama, 
each of which had both formative and summative 
components. A third evaluation is under way. 

The first evaluation, during the program’s pilot 
implementation, focused on providing feedback 
that could be used to modify the program, if 
need be, and on generating information to share 
with Alabama schools. Evaluation activities at 
this stage included a literature review and ob- 
servation visits to six randomly selected pilot 
sites, where evaluators conducted interviews 
and surveys. They also ran focus groups, for 
which researchers interviewed respondents in a 
group setting. Evaluators chose these methods 
to generate qualitative information about how 



the pilot program was being implemented and 
what changes might be needed to strengthen it. 

The second evaluation took more of a summa- 
tive approach, looking to see whether or not 
ACCESS was meeting its overall objectives. First, 
evaluators conducted surveys and interviews 
of students and teachers, as well as interviews 
with school administrators and personnel at the 
program’s three Regional Support Centers. In 
addition, they gathered student enrollment and 
achievement data, statewide course enrollment 
and completion rates, and other program out- 
come data, such as the number of new distance 
courses developed and the number of partici- 
pating schools. 

This second evaluation also used a quasi-exper- 
imental design (see Glossary of Common Evalu- 
ation Terms, p. 65) to provide information on 
program effects. Evaluators compared achieve- 
ment outcomes between ACCESS participants 
and students statewide, between students in 
interactive videoconferencing courses and stu- 
dents in traditional settings, and between stu- 
dents who participated in online courses and 
those who took courses offered in the interac- 
tive videoconferencing format. 

As of early 2008, the evaluators were conduct- 
ing a third evaluation, integrating the data from 
the first two studies and focusing on student 
achievement. Looking ahead, ACCESS leaders 
plan to continue gathering data annually in an- 
ticipation of conducting longitudinal studies that 
will identify ACCESS’S full impact on student 
progress and achievement. 

Together, the carefully planned evaluation ac- 
tivities conducted by ACCESS’S evaluators have 
generated several findings and recommendations 
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Reasons and Contexts for Formative Versus Summative Evaluations 

Formative and summative evaluations can each serve important functions for programs. Formative evaluations, 
sometimes called "process evaluations/' are conducted primarily to find out how a program is being implemented 
and how it might be strengthened. Summative evaluations, also called "outcome evaluations," are appropriate for 
better-established programs, when program leaders have settled on their best policies and practices and want to 
know, for example, what results the program is yielding. 

Ideally, formative evaluations are developed as partnerships that give all stakeholders a hand in planning and help- 
ing conduct the evaluation. Explicitly framing a formative evaluation as a collaboration among stakeholders can 
help in more ways than one. Practitioners are more likely to cooperate with and welcome evaluators rather than feel 
wary or threatened— a common reaction. In addition, practitioners who are invited to be partners in an evaluation 
are more likely to feel invested in its results and to implement the findings and recommendations. 

Even more than formative evaluations, summative evaluations can be perceived by practitioners as threatening 
and, in many cases, program staff are not eager to welcome evaluators into their midst. Even in these situations, 
however, their reaction can be mitigated if evaluators work diligently to communicate the evaluation's goals. Evalu- 
ators should make clear their intention to provide the program with information that can be used to strengthen it, or 
to give the program credible data to show funders or other stakeholders. In many cases, summative evaluations do 
not uncover findings that are unexpected; they merely provide hard data to back up the anecdotes and hunches of 
program leaders and staff. 

Program leaders who are contemplating an evaluation also will want to consider the costs of whatever type of study 
they choose. Some formative evaluations are relatively informal. For example, a formative evaluation might consist 
primarily of short-term activities conducted by internal staff, like brief surveys of participants, to gather feedback about 
different aspects of the program. This type of evaluation is inexpensive and can be ideal for leaders seeking ongoing 
information to strengthen their program. In other instances, formative evaluation is more structured and formal. For 
instance, an external evaluator may be hired to observe or interview program participants, or to conduct field surveys 
and analyze the data. Having an external evaluator can bring increased objectivity, but it also adds cost. 

In many cases, summative evaluations are more formal and expensive operations, particularly if they are using 
experimental or quasi-experimental designs that require increased coordination and management and sophisticated 
data analysis techniques. Typically, external evaluators conduct summative evaluations, which generally extends 
the timeline and ups the costs. Still, experimental and quasi-experimental designs may provide the most reliable 
information about program effects. 

Finally, program leaders should consider that an evaluation need not be exclusively formative or summative. As 
the ACCESS case illustrates (see pp. 7-10), sometimes it is best for programs to combine elements of both, either 
concurrently or in different years. 



that already have been used to strengthen the 
program. For example, their findings suggest 
that students participating in the distance learn- 
ing courses are completing courses at high rates 
and, in the case of the College Board Advanced 



Placement (AP) courses,* are achieving scores 
comparable to students taught in traditional 

* Run by the nonprofit College Board, the Advanced Placement program offers col- 
lege-level course work to high school students. Many institutions of higher education 
offer college credits to students who take AP courses. 
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settings. These kinds of data could be critical to 
maintaining funding and political support for the 
program in the future. ACCESS also is building a 
longitudinal database to provide a core of data 
for use in future evaluations and, program lead- 
ers hope, to help determine ACCESS’S long-term 
impact on student progress and achievement. 
With these data collection tools and processes 
in place, ACCESS has armed itself to address the 
needs and expectations of various stakeholders 
inside and outside the program. 

Make the Most of Mandatory Program 
Evaluations 

While all leaders and staff of online education 
programs are likely to want to understand their 
influence on student learning, some have no 
choice in the matter. Many online programs must 
deliver summative student outcome data because 
a funder or regulatory body demands it. In the 
case of Arizona Virtual Academy (AZVA), a K-12 
statewide public charter school, the program 
must comply with several mandatory evaluation 
requirements: First, school leaders are required 
by the state of Arizona to submit an annual ef- 
fectiveness review, which is used to determine 
whether or not the school’s charter will be re- 
newed. For this yearly report, AZVA staff must 
provide data on student enrollment, retention, 
mobility, and state test performance. The report 
also must include pupil and parent satisfaction 
data, which AZVA collects online at the end of 
each course, and a detailed self-evaluation of 
operational and administrative efficiency. 

AZVA also must answer to K12 Inc., the educa- 
tion company that supplies the program’s cur- 
riculum for all grade levels. K12 Inc. has its own 
interest in evaluating how well its curriculum 



products are working and in ensuring that it is 
partnered with a high-quality school. AZVA’s di- 
rector, Mary Gifford, says that “from the second 
you open your school,” there is an expectation 
[on the part of K12 Inc.] that you will collect 
data, analyze them, and use them to make deci- 
sions. “K12 Inc. has established best practices for 
academic achievement. They take great pride in 
being a data-driven company,” she adds. It con- 
ducts quality assurance audits at AZVA approxi- 
mately every two years, which consist of a site 
visit conducted by K12 Inc. personnel and an ex- 
tensive questionnaire, completed by AZVA, that 
documents various aspects of the program, such 
as instruction, organizational structure, and par- 
ent-school relations. K12 Inc. also requires AZVA 
to produce a detailed annual School Improve- 
ment Plan (SIP), which covers program opera- 
tions as well as student achievement. The plan 
must include an analysis of student performance 
on standardized state tests, including a compari- 
son of the performance of AZVA students to the 
performance of all students across the state. 

Each of these mandates — those of the state and 
those of AZVA’s curriculum provider — has an im- 
portant purpose. But the multiple requirements 
add up to what could be seen as a substantial 
burden for any small organization. AZVA’s small 
central staff chooses to look at it differently. 
Although the requirements generate year-round 
work for AZVA employees, they have made the 
most of these activities by using them for their 
own purposes, too. Each of the many mandated 
evaluation activities serves an internal purpose: 
Staff members pore over test scores, course 
completion data, and user satisfaction data to 
determine how they can improve their program. 
The SIP is used as a guiding document to or- 
ganize information about what aspects of the 
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program need fixing and to monitor the school’s 
progress toward its stated goals. Although the 
process is time-consuming, everyone benefits: 
K12 Inc. is assured that the school is doing what 
it should, AZVA has a structured approach to 
improving its program, and it can demonstrate 
to the state and others that student performance 
is meeting expectations. 

AZVA is able to make the most of its mandatory 
evaluations because the school’s culture sup- 
ports it: Staff members incorporate data collec- 
tion and analysis into their everyday responsibil- 
ities, rather than viewing them as extra burdens 
on their workload. Furthermore, AZVA’s lead- 
ers initiate data collection and analysis efforts 
of their own. They frequently conduct online 
surveys of parents to gauge the effectiveness of 
particular services. More recently, they also have 
begun to survey teachers about their profession- 
al development needs and their satisfaction with 
the trainings provided to them. “We’re trying to 
do surveys after every single professional de- 
velopment [session], to find out what was most 
effective,” says Gifford. “Do they want more of 
this, less of this? Was this too much time? Was 
this enough time? That feedback has been very 
good.” AZVA’s K-8 principal, Bridget Schleifer, 
confirms that teachers’ responses to the surveys 
are taken very seriously. “Whenever a survey 
comes up and we see a need,” she says, “we 
will definitely put that on the agenda for the 
next month of professional development.” 

Together, these many efforts provide AZVA with 
comprehensive information that helps the school 
address external accountability demands, while 
also serving internal program improvement ob- 
jectives. Just as important, AZVA’s various evalu- 
ation activities are integrated and support each 



other. For instance, the SIP is based on the find- 
ings from the evaluation activities mandated 
by the state and K12 Inc., and the latter’s audit 
process includes an update on progress made 
toward SIP goals. More broadly, the formative 
evaluation activities help the school leaders to 
set specific academic goals and develop a plan 
for reaching them, which ultimately helps them 
improve the achievement outcomes assessed by 
the state. One lesson AZVA illustrates is how to 
make the most of evaluations that are initiated 
externally by treating every data collection ac- 
tivity as an opportunity to learn something valu- 
able that can serve the program. 

Summary 

As the above program evaluations demonstrate, 
sometimes the best approach to meeting the 
needs of multiple stakeholders is being proac- 
tive. The steps are straightforward but critical: 
When considering an evaluation, program lead- 
ers should first identify the various stakehold- 
ers who will be interested in the evaluation 
and what they will want to know. They might 
consider conducting interviews or focus groups 
to collect this information. Leaders then need 
to sift through this information and prioritize 
their assessment goals. They should develop a 
clear vision for what they want their evaluation 
to do and work with evaluators to choose an 
evaluation type that will meet their needs. If it 
is meant to serve several different stakeholder 
groups, evaluators and program leaders might 
decide to conduct a multi-method study that 
combines formative and summative evaluation 
activities. They might also consider develop- 
ing a multiyear evaluation plan that addresses 
separate goals in different years. In the report- 
ing phase, program leaders and evaluators can 
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consider communicating findings to different 
stakeholder audiences in ways that are tailored 
to their needs and interests. 

In instances where online programs partici- 
pate in mandatory evaluations, program leaders 
should seek to build on these efforts and use 
them for internal purposes as well. They can 
leverage the information learned in a summa- 
tive evaluation to improve the program, acquire 
funding, or establish the program’s credibility. 
Evaluators can help program leaders piggyback 
on any mandatory assessment activities by se- 
lecting complementary evaluation methods that 
will provide not just the required data but also 
information that program staff can use for their 
own improvement purposes. 

Building on the Existing Base of 
Knowledge 

Under normal circumstances, evaluators fre- 
quently begin their work by reviewing available 
research literature. They also may search for a 
conceptual framework among similar studies or 
look for existing data collection tools, such as 
surveys and rubrics, that can be borrowed or 
adapted. Yet, compared to many other topics in 
K-12 education, the body of research literature 
on K-12 online learning is relatively new and 
narrow. Available descriptive studies are often 
very specific and offer findings that are not eas- 
ily generalized to other online programs or re- 
sources. Empirical studies are few. Other kinds 
of tools for evaluators are limited, too. Recent 
efforts have led to multiple sets of standards for 
K-12 online learning (see Standards for K-12 
Online Learning, p. 13). However, there still are 
no widely accepted education program outcome 



measures, making it difficult for evaluators to 
gauge success relative to other online or tradi- 
tional programs. 

Given the existing base of knowledge on K-12 
online learning, how should evaluators proceed? 
Of course, evaluators will first want to consult 
the K-12 online learning research that does ex- 
ist. Although the field is comparatively limited, it 
is growing each year and already has generated 
a number of significant resources (see appendix 
A, p. 59). Among other organizations, the North 
American Council for Online Learning (NACOL) 
has developed and collected dozens of publica- 
tions, research studies, and other resources use- 
ful for evaluators of K-12 online learning pro- 
grams. In some cases, evaluators also may want 
to look to higher education organizations, which 
have a richer literature on online learning evalu- 
ation, including several publications that identify 
standards and best practices (see appendix A). 
In some cases, these resources can be adapted 
for K-12 settings, but in other cases, researchers 
have found, they do not translate well. 

Another approach is to develop evaluation tools 
and techniques from scratch. As described be- 
low, this may be as simple as defining and stan- 
dardizing the outcome measures used among 
multiple schools or vendors, like the evaluators 
of Digital Learning Commons did. Or it may be 
a much more ambitious effort, as when Apple- 
ton eSchool’s leaders developed a new model 
for evaluating virtual schools with an online sys- 
tem for compiling evaluation data. Finally, some 
evaluators respond to a limited knowledge base 
by adding to it. For example, the evaluators of 
Louisiana’s Algebra I Online program have pub- 
lished their evaluation findings for the benefit 
of other evaluators and program administrators. 
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Standards for K-12 Online Learning 

As online learning in K-l 2 education has expanded, there has been an effort to begin to codify current best prac- 
tices into a set of standards that educators can look to in guiding their own performance. Several organizations 
have released sets of these emerging standards based on practitioner input to date. 

In late 2006, the Educational Technology Cooperative of the Southern Regional Education Board (SREB) issued 
Standards for Quality Online Courses. These standards are available along with other key resources from SREB at 
http://www.sreb.org/programs/EdTech/SVS/index.asp. 

A year later, NACOL published National Standards of Quality for Online Courses, which endorsed SREB's standards 
and added a few others. The national standards cover six broad topic areas: course content, instructional design, 
student assessment, technology, course evaluation and management, and 21st-century skills. 

NACOL also developed National Quality Standards for Online Teaching in 2008 and is currently working on pro- 
gram standards. The standards for online courses and teaching can be found at http://www.nacol.org along with 
many other resources. 

The National Education Association also has published standards for online courses and online teachers, both 
available at http://www.nea.org. 



In a different but equally helpful fashion, the 
leaders of Appleton eSchool have contributed to 
the field, by developing a Web site that allows 
administrators of online programs to share their 
best practices in a public forum. 

Clearly Define Outcome Measures 

Evaluators must be able to clearly articulate key 
program goals, define outcomes that align with 
them, and then identify specific outcome mea- 
sures that can be used to track the program’s 
progress in meeting the goals. Presently, how- 
ever, outcome measures for evaluating online 
learning programs are not consistently defined, 
which makes it difficult for stakeholders to gauge 
a program’s success, compare it to other pro- 
grams, or set improvement goals that are based 
on the experience of other programs. Further- 
more, the lack of consistent outcome measures 
creates technical headaches for evaluators. A re- 
cent article co-authored by Liz Pape, president 



and chief executive officer of Virtual High School 
Global Consortium, a nonprofit network of on- 
line schools, describes the problem this way: 

Although standards for online course and 
program effectiveness have been identified , 
data-driven yardsticks for measuring 
against those standards are not generally 
agreed upon or in use. There is no general 
agreement about what to measure and 
how to measure. Even for measures 
that most programs use, such as course 
completion rates, there is variation in the 
metrics because the online programs that 
measure course completion rates do not 
measure in the same manner? 

The evaluators of Washington state’s Digital 
Learning Commons (DLC) encountered just such 
a problem when attempting to calculate the 
number of online course-takers served by the 
program. DLC is a centrally hosted Web portal 
that offers a wide range of online courses from 
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numerous private vendors. When evaluators 
from Cohen Research and Evaluation tried to an- 
alyze DLC’s course-taking and completion rates, 
they found a range of reporting practices among 
the vendors: Some tracked student participation 
throughout the course, while others reported 
only on the number of students who completed 
a course and received a final grade. Also, in some 
cases, vendors did not differentiate between stu- 
dents who withdrew from a course and students 
who received an F, conflating two different stu- 
dent outcomes that might have distinct implica- 
tions for program improvement. 

Pape et al. note additional problems in the ways 
that course completion is defined: 

When does the measure begin ? How is 
completion defined? Do students have a 
“ no penalty” period of enrollment in the 
online course during which they may 
drop from the course and will not be 
considered when calculating the course 
completion rate ? Is completion defined 
as a grade of 60 or 65? How are students 
who withdrew from the course after the 
“no penalty” period counted, especially if 
they withdrew with a passing grade? 4 

Following the recommendations of their evalua- 
tors, DLC staff made efforts to communicate defi- 
nitions of course completion and withdrawal that 
were internally consistent and made sure each 
vendor was reporting accurate data based on 
these conversations. The result was a higher level 
of consistency and accuracy in data reporting. 

There is growing attention to the problem of un- 
defined outcome measures in the field of evalu- 
ating online learning. A 2004 report by Cathy 
Cavanaugh et al., specifically recommended 
that standards be developed “for reporting the 



academic and programmatic outcomes of dis- 
tance learning programs.” 5 A NACOL effort is 
under way to develop benchmarks for measur- 
ing program effectiveness and overall standards 
for program quality. Meanwhile, the best evalu- 
ators can do is to ensure internal consistency in 
the outcome measures used across all of their 
own data sources. Although the research base 
is limited, evaluators may be able to find similar 
studies for ideas on how to define outcomes. 
Moving forward, online program evaluators can 
proactively find ways to share research methods 
and definitions and reach a consensus on the 
best ways to measure program effectiveness. 

Work Collaboratively to Develop New 
Evaluation Tools 

In the early years of Appleton eSchool, school 
leaders became aware that they needed an over- 
all evaluation system to determine the school’s 
strengths and weaknesses. After failing to find an 
existing comprehensive tool that would fit their 
needs, Ben Vogel, Appleton’s principal and Gov- 
ernance Board chair, and Connie Radtke, Apple- 
ton’s program leader, began to develop their 
own evaluation process. Their goal was to de- 
sign an instrument that would identify the core 
components necessary for students to be suc- 
cessful in an online program. In addition, they 
wanted to create a process that could be used to 
prompt dialogue among program leaders, staff, 
governance board members, and external col- 
leagues, about the components of a successful 
online learning experience, in order to provide 
direction for future growth and enhancement. 

Through extensive internal discussions and 
consultation with external colleagues, includ- 
ing staff from such online programs as Virtual 
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High School and Florida Virtual School, Vo- 
gel and Radtke identified eight key program 
components and developed a rubric for mea- 
suring them called the Online Program Perceiv- 
er Instrument (OPPI). (See Key Components of 
Appleton eSchool’s Online Program Perceiver 
Instrument, p. 16.) Vogel says, “[Our goal was] 
to figure out those universal core components 
that are necessary in a K— 12 online program to 
allow students to be successful. . . . We were able 
to share [our initial thoughts] with other people 
in the K-12 realm, and say, ‘What are we miss- 
ing? And what other pieces should we add?’ It 
just kind of developed from there.” 

The core components identified in the OPPI are 
what Appleton leaders see as the essential build- 
ing blocks for supporting students in an online 
environment. The eight components address the 
online program user’s entire experience, from 
first learning about the course or program to 
completing it. 

When developing the OPPI, Vogel and Radtke 
researched many existing online program eval- 
uations in higher education, but found them 
insufficient for building a comprehensive rubric 
at the K-12 level. Vogel notes, for example, that 
having face-to-face mentors or coaches for stu- 
dents taking online courses is critical at the K-12 
level, whereas it is not considered so important 
for older students who are studying online in 
higher education. To capture this program ele- 
ment, Vogel and Radtke included “Program Sup- 
port” as a key component in the OPPI rubric, fo- 
cusing on the training given to mentors (usually 
parents in the Appleton eSchool model) as well 
as training for local school contacts who sup- 
port and coordinate local student access to on- 
line courses (see fig. 1, Excerpt from Appleton 



eSchool’s Online Program Perceiver Instrument, 
p. 17). To assess mentor perception of program 
quality, the evaluators surveyed them following 
the completion of each online course. 

As part of the OPPI process, Vogel and Radtke 
developed a three-phase approach to internal 
evaluation, which they refer to as a “self-dis- 
covery” process. In the Discovery Phase, pro- 
gram personnel fill out a report that describes 
the school’s practices in each of the eight areas 
identified in the OPPI. Then program decision- 
makers use the rubric to determine what level of 
program performance is being attained for each 
element: deficient, developing, proficient, or ex- 
emplary. In addition, program leaders e-mail sur- 
veys to students, mentors (usually parents), and 
teachers at the end of each course, giving them 
an opportunity to comment on the program’s 
performance in each of the eight OPPI areas. 
In the Outcome Phase, results from the Discov- 
ery Phase report and surveys are summarized, 
generating a numerical rating in each program 
area. At the same time, information on student 
outcomes is reviewed, including student grades, 
grade point averages, and course completion 
rates. Program decision-makers synthesize all 
of this data in an outcome sheet and use it to 
set goals for future growth and development. 
Finally, in the Sharing of Best Practices Phase, 
program leaders may select particular practices 
to share with other programs. Appleton has part- 
nered with other online programs to form the 
Wisconsin eSchool Network, a consortium of vir- 
tual schools that share resources. The Network’s 
Web site includes a Best Practices Portfolio and 
schools using the OPPI are invited to submit ex- 
amples from their evaluations. 6 Practices that are 
determined to be in the “proficient” or “exem- 
plary” range are considered for placement in the 
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Key Components of Appleton eSchool’s Online Program 
Perceiver Instrument (OPPI) 

Practitioners and program administrators use the OPPI to evaluate program performance in eight different areas: 

1 . Program Information: System provides and updates information necessary for prospective users to understand the 
program being offered and determine whether it may be a good fit for students. 

2. Program Orientation: System provides an introduction or orientation that prepares students to be successful in the 
online course. 

3. Program Technology: System provides and supports program users' hardware and software needs in the online 
environment. 

4. Program Curriculum: System provides and supports an interactive curriculum for the online course. 

5. Program Teaching: System provides and supports teaching personnel dedicated to online learning and their 
online students. 

6. Characteristics and Skills Displayed by Successful Online Students: System identifies and provides opportunities 
for students to practice characteristics necessary for success in an online environment. 

7. Program Support: System provides and supports a system of support for all online students and mentors (e.g. ; 
parents) and coaches. 

8. Program Data Collection: System collects data and uses that data to inform program decision-makers and share 
information with other programs. 



portfolio, and participating schools are cited for 
their contributions. The entire evaluation system 
is Web-based, allowing for streamlined data col- 
lection, analysis, and sharing. 

Appleton’s decision to develop its own evaluation 
aibric and process provides several advantages. 
Besides resulting in a perfectly tailored evaluation 
process, Appleton leaders also have the ability to 
evaluate their program at any time without waiting 
for funding or relying on a third-party evaluator. 
Still, developing an evaluation process is typically 
expensive and may not be a practical option for 
many programs. Appleton’s leaders spent many 
hours researching and developing outcome mea- 
sures (i.e., the descriptions of practice for each 
program component under each level of program 



performance). They also invested about $15,000 
of their program grant funds to pay a Web de- 
veloper to design the online system for compil- 
ing and displaying evaluation data. For others at- 
tempting to develop this type of tailored aibric 
and process, accessing outside expertise is critical 
to fill gaps in knowledge or capacity. Appleton 
leaders collaborated extensively with experienced 
colleagues from other virtual schools, particularly 
as they were developing their aibric. 

Share Evaluation Findings With Other 
Programs and Evaluators 

Although the OPPI rubric was developed specifi- 
cally for Appleton, from the beginning Vogel and 
Radtke intended to share it with other programs. 



Figure 1 . Excerpt From Appleton eSchool's Online Program Perceiver Instrument 



Component 7. Program Support. The purpose of the support network is to provide additional support 
to the student that complements the instructor. This includes not only a support person from home, 
but also other school resources that may include a counselor, social worker, etc. 



ELEMENT 


LEVEL OF PROGRAM PERFORMANCE 


DEFICIENT 


DEVELOPING 


PROFICIENT 


EXEMPLARY 


Overall Student 
Support Structure Plan 


No student support 
structure plan in place 
or the plan is poorly 
written and/or incom- 
plete. Students are not 
receiving necessary 
resources. 


Student support struc- 
ture plan is in place 
but parts of the plan 
are incomplete or may 
be unclear. The plan 
provides for resources 
necessary for student 
success but some gaps 
in the plan may exist. 


Student support structure 
plan is in place and is 
clear in its purpose and 
objectives. The plan 
provides for resources 
necessary for student 
success. 


Exemplary student sup- 
port structure plan is in 
place and is a proven 
model for assuring stu- 
dents have necessary 
support and ongoing 
support. 


Mentor responsibilities 
have been developed 
and shared 


Mentor responsibilities 
have not been devel- 
oped or are incomplete 
and/or unclear or have 
not been appropriately 
shared. 


Mentor responsibilities 
have been developed 
and shared but parts 
may be unclear and/ 
or all mentors have not 
received the information. 


Mentor responsibilities 
are well written, clear, 
and have been shared 
with all mentors. 


Mentor responsibilities 
are extremely well writ- 
ten, clear, and shared in 
numerous formats with 
all mentors. 


Mentors are provided 
necessary training 
and support 


No mentor training pro- 
gram provided or there 
are no written objectives 
that clearly outline the 
purpose of the men- 
tor training program. 

No ongoing support is 
available. 


Mentor training program 
is provided and there 
are written objectives 
that outline the purpose 
of the mentor training. 
Some objectives may 
be unclear. Ongoing 
support is available, but 
may be inconsistent. 


Mentor training program 
is provided and there 
are objectives that out- 
line the purpose of the 
mentor training. Few 
objectives are unclear. 
Ongoing support is 
available. 


Mentor training program 
is provided and there 
are extremely clear and 
concise objectives that 
outline the purpose of 
the mentor training pro- 
gram. Ongoing support 
is consistently provided. 


Mentors provide 
positive support for 
student 


No system in place to 
monitor mentor support 
and/or concerns that 
may arise. 


System in place to 
monitor mentor support 
and/or concerns that 
may arise, but system 
may break down from 
time to time. 


System in place to mon- 
itor mentor support and/ 
or concerns that may 
arise. System is reliable 
most of the time. 


Proven system in place 
to monitor mentor sup- 
port and/or concerns 
that may arise. The 
system provides op- 
portunities for two-way 
communication. 


Mentors have the abil- 
ity to communicate 
with teacher 


No system in place to 
allow mentors to com- 
municate with teacher in 
a timely manner. 


System in place to 
allow mentors to com- 
municate with teacher 
in a timely manner but 
system may break down 
from time to time. 


System in place to allow 
mentors to communi- 
cate with teacher in a 
timely manner. System 
is reliable most of the 
time. 


Proven system in 
place to allow men- 
tors to communicate 
with teacher in a timely 
manner. The system 
provides check-ins with 
teacher on a regular 
basis. 



* The U.S. Department of Education does not mandate or prescribe particular curricula or lesson plans. The information in this figure was provided by the 
identified site or program and is included here as an illustration of only one of many resources that educators may find helpful and use at their option. The 
Department cannot ensure its accuracy. Furthermore, the inclusion of information in this figure does not reflect the relevance, timeliness, or completeness of 
this information; nor is it intended to endorse any views, approaches, products, or services mentioned in the figure. 
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This rubric is currently accessible free of charge 
through the Web site of the Wisconsin eSchool 
Network, described above.* Vogel explains that 
the OPPI and its umbrella evaluation system are 
readily adaptable to other programs: “Internally, 
this system doesn’t ask people to have an over- 
whelming amount of knowledge. It allows peo- 
ple to make tweaks as needed for their particu- 
lar programs, but they don’t have to create the 
whole wheel over again [by designing their own 
evaluation system].” The OPPI system also allows 
for aggregation of results across multiple pro- 
grams — a mechanism that would allow groups of 
schools in a state, for example, to analyze their 
combined data. To assist schools using the OPPI 
for the first time, Appleton offers consultation 
services to teach other users how to interpret and 
communicate key findings. Through their efforts 
to share their evaluation tool and create the on- 
line forum, Appleton leaders have developed an 
efficient and innovative way to build the knowl- 
edge base on online learning programs. 

In Louisiana, evaluators from Education Devel- 
opment Center (EDC) have used more conven- 
tional channels for sharing findings from the 
evaluation of the state’s Algebra I Online pro- 
gram. This program was created by the Louisiana 
Department of Education to address the state’s 
shortage of highly qualified algebra teachers, es- 
pecially in urban and rural settings. In addition, 
districts desiring to provide certified teachers 
access to pedagogy training and mentoring so 
they can build capacity for strong mathematics 
instruction are eligible to participate. In Alge- 
bra I Online courses, students physically attend 
class in a standard bricks-and-mortar classroom 
at their home school, which is managed by a 

* Registration is required to access the OPPI and some consultation with its developers 
may be needed to implement the process fully. 



teacher who may not be certified to deliver al- 
gebra instruction. But once in this classroom, 
each student has his or her own computer and 
participates in an online class delivered by a 
highly qualified (i.e., certified) algebra teacher. 
The in-class teacher gives students face-to-face 
assistance, oversees lab activities, proctors tests, 
and is generally responsible for maintaining an 
atmosphere that is conducive to learning. The 
online teacher delivers the algebra instruction, 
answers students’ questions via an online dis- 
cussion board, grades assignments via e-mail, 
provides students with feedback on homework 
and tests, and submits grades. The online and 
in-class teachers communicate frequently with 
each other to discuss students’ progress and col- 
laborate on how to help students learn the par- 
ticular content being covered. This interaction 
between teachers not only benefits students; it 
also serves as a form of professional develop- 
ment for the in-class instructors. In addition to 
providing all students with high-quality algebra 
instruction, a secondary goal of the program 
is to increase the instructional skills of the in- 
class teachers and support them in earning their 
mathematics teaching certificate. 

Although its founders believed that the Alge- 
bra I Online model offered great promise for 
addressing Louisiana’s shortage of mathemat- 
ics teachers, when the program was launched 
in 2002 they had no evidence to back up this 
belief. The key question was whether such a 
program could provide students with learning 
opportunities that were as effective as those 
in traditional settings. If it were as effective, 
the program could provide a timely and cost- 
effective solution for the mathematics teacher 
shortage. Louisiana needed hard evidence to 
show whether the program was credible. 
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Following a number of internal evaluation activi- 
ties during the program’s first two years, in 2004 
the program’s leaders engaged an external eval- 
uation team consisting of Rebecca Carey of EDC, 
an organization with experience in researching 
online learning; Laura O’Dwyer of Boston Col- 
lege’s Lynch School of Education; and Glenn 
Kleiman of the Friday Institute for Educational 
Innovation at North Carolina State University, 
College of Education. The evaluators were im- 
pressed by the program leaders’ willingness to 
undergo a rigorous evaluation. “We didn’t have 
to do a lot of convincing,” says EDC’s Carey. 
“They wanted it to be as rigorous as possible, 
which was great and, I think, a little bit unusual.” 
The program also was given a boost in the form 
of a grant from the North Central Regional Edu- 
cational Laboratory (NCREL), a federally funded 
education laboratory. The grant funded primary 
research on the effectiveness of online learning 
and provided the project with $75,000 beyond its 
initial $35,000 evaluation budget from the state 
legislature. The additional funding allowed EDC 
to add focus groups and in-class observations, as 
well as to augment its own evaluation capacity 
by hiring an external consultant with extensive 
expertise in research methodology and analysis. 

The EDC evaluators chose a quasi-experimen- 
tal design (see Glossary of Common Evaluation 
Terms, p. 65) to compare students enrolled in 
the online algebra program with those studying 
algebra only in a traditional face-to-face class- 
room format. To examine the impact of the Alge- 
bra I Online course, they used hierarchical linear 
modeling to analyze posttest scores and other 
data collected from the treatment and control 
groups. To determine if students in online learn- 
ing programs engaged in different types of peer- 
to-peer interactions and if they perceived their 



learning experiences differently than students in 
traditional classrooms, the evaluators surveyed 
students in both environments and conducted 
observations in half of the treatment classrooms. 
In total, the evaluators studied Algebra I Online 
courses and face-to-face courses in six districts. 

After completing their assessment, the evaluators 
produced final reports for the Louisiana Depart- 
ment of Education and NCREL and later wrote 
two articles about the program for professional 
journals. The first article, published in the Jour- 
nal of Research on Technology in Education , 7 
described Algebra I Online as a viable model for 
providing effective algebra instruction. In the 
study, online students showed comparable (and 
sometimes stronger) test scores, stayed on task, 
and spent more time interacting with classmates 
about math content than students in traditional 
classroom settings. The evaluators speculated 
that this was a result of the program’s unique 
model, which brings the online students togeth- 
er with their peers at a regularly scheduled time. 
The evaluators found a few areas for concern as 
well. For example, a higher percentage of on- 
line students reported that they did not have a 
good learning experience, a finding that is both 
supported and contradicted by research studies 
on online learning from higher education. The 
evaluation also found that the Algebra I Online 
students felt less confident in their algebra skills 
than did traditional students, a finding the evalu- 
ators feel is particularly ripe for further research 
efforts. (For additional discussion, see Interpret- 
ing the Impact of Program Maturity, p. 40.) 

The Algebra I Online evaluators wrote and pub- 
lished a second article in the Journal of Asyn- 
chronous Technologies that focused on the pro- 
gram as a professional development model for 
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uncertified or inexperienced math teachers. 8 
In this piece, the evaluators described the pro- 
grams’ pairing of online and in-class teachers as 
a “viable online model for providing [the in-class] 
teachers with an effective model for authentic 
and embedded professional development that is 
relevant to their classroom experiences.” 

Of course, not all programs will have the re- 
sources to contribute to the research literature in 
this manner. In Louisiana’s case, the evaluators 
had extensive expertise in online evaluation and 
took the time and initiative required for publish- 
ing their findings in academic journals. In so 
doing, they served two purposes: providing the 
program leaders with the evidence they needed 
to confidently proceed with Algebra I Online 
and publishing much-needed research to states 
that might be considering similar approaches. 

Summary 

The literature on K-12 online learning is grow- 
ing. Several publications and resources docu- 
ment emerging best practices and policies in 
online learning (see appendix A). For program 
leaders and evaluators who are developing an 
evaluation, the quality standards from SREB and 
NACOL provide a basic framework for looking at 
the quality of online courses and teachers. There 
also is a growing body of studies from which 
evaluators can draw lessons and adapt methods; 
evaluators need not reinvent the wheel. At the 
same time, they must exercise caution when ap- 
plying findings, assumptions, or methods from 
other studies, as online programs and resources 
vary tremendously in whom they serve, what 
they offer, and how they offer it. What works 
best for one program evaluation may not be 
appropriate for another. 



Given the lack of commonly used outcome mea- 
sures for online learning evaluations, individual 
programs should at least strive for internal consis- 
tency, as DLC has. If working with multiple ven- 
dors or school sites, online program leaders need 
to articulate a clear set of business rules for what 
data are to be collected and how, distributing 
these guidelines to all parties who are collecting 
information. Looking ahead, without these com- 
mon guidelines, evaluators will be hard pressed 
to compare their program’s outcomes with oth- 
ers. Some of the evaluators featured in this guide 
have made contributions to the field of online 
learning evaluation, like Appleton’s leaders, who 
developed an evaluation model that can be bor- 
rowed or adapted by other programs, and the 
evaluators of Algebra I Online, who published 
their study findings in professional journals. 

Evaluating Multifaceted Online 
Resources 

Like many traditional education programs, on- 
line learning resources sometimes offer par- 
ticipants a wide range of learning experiences. 
Their multifaceted offerings are a boon for stu- 
dents or teachers with diverse interests, but can 
be a dilemma for evaluators seeking uniform 
findings about effectiveness. In the case of an 
educational Web site like Washington’s DLC, for 
example, different types of users will explore 
different resources; some students may take an 
online course while others may be researching 
colleges or seeking a mentor. Virtual schools that 
use multiple course providers present a similar 
conundrum, and even the same online course 
may offer differentiated learning experiences if, 
for example, students initiate more or less con- 
tact with the course instructor or receive varying 



degrees of face-to-face support from a parent 
or coach. (A similar lack of uniformity can be 
found in traditional settings with different in- 
structors using varying instructional models.) 

When faced with a multifaceted resource, how 
is an evaluator to understand and document the 
online learning experience, much less deter- 
mine what value it adds? 

Several of the evaluations featured in this guide 
encountered this issue, albeit in distinct ways. 
DLC evaluators were challenged to assess how 
students experienced and benefited from the 
Web site’s broad range of resources. Evaluators 
of Maryland Public Television’s Thinkport Web 
site, with its extensive teacher and student re- 
sources from many providers, similarly struggled 
to assess its impact on student achievement. In 
a very different example, the evaluators for the 
Arizona Virtual Academy (AZVA) faced the chal- 
lenge of evaluating a hybrid course that includ- 
ed both online and face-to-face components 
and in which students’ individual experiences 
varied considerably. 

Combine Breadth and Depth to Evaluate 
Resource-rich Web Sites 

With its wide range of services and resources 
for students and teachers, DLC is a sprawling, 
diverse project to evaluate. Through this cen- 
trally hosted Web site, students can access over 
300 online courses, including all core subjects 
and various electives, plus Advanced Placement 
(AP) courses and English as a Second Language. 
DLC also offers students online mentors, college 
and career planning resources, and an extensive 
digital library. In addition, DLC offers other re- 
sources and tools for teachers, including online 
curricula, activities, and diagnostics. For schools 



that sign up to use DLC, the program provides 
training for school personnel to assist them in 
implementing the Web site’s resources. 

Initially, DLC’s evaluation strategy was to col- 
lect broad information about how the Web site 
is used. Later, program leaders shifted their 
strategy to focus on fewer and narrower topics 
that could substantiate the program’s efficacy. 
The evaluators focused on student achieve- 
ment in the online courses and on school-level 
supports for educators to help them make the 
most of DLC’s resources. Together, the series of 
DLC evaluations — there have been at least five 
distinct efforts to date — combine breadth and 
depth, have built on each other’s findings from 
year to year, and have produced important for- 
mative and summative findings (see Glossary of 
Common Evaluation Terms, p. 65). 

In the project’s first year, Debra Friedman, a lead 
administrator at the University of Washington (a 
DLC partner organization), conducted an evalu- 
ation that sought information on whom DLC 
serves and what school conditions and poli- 
cies best support its use. To answer these ques- 
tions, the evaluators selected methods designed 
to elicit information directly from participants, 
including discussions with DLC administrators, 
board members, school leaders, and teachers, as 
well as student and teacher surveys that asked 
about their use of computers and the Internet 
and about the utility of the DLC training. The 
evaluator also looked at a few indicators of stu- 
dent achievement, such as the grades that stu- 
dents received for DLC online courses. 

The evaluation yielded broad findings about 
operational issues and noted the need for DLC 
to prioritize among its many purposes and 
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audiences. It also uncovered an important find- 
ing about student achievement in the online 
courses: The greatest percentage of students 
(52 percent) received Fs, but the next greatest 
percentage (37 percent) received As. To explain 
these outcomes, the evaluator pointed to the lack 
of uniformity in students’ motivation and needs, 
and the type of academic support available to 
them. The evaluator also noted the varying qual- 
ity of the vendors who provided courses, finding 
that “some vendors are flexible and responsive 
to students’ needs; others are notably inflexible. 
Some are highly professional operations, others 
less so.” 9 This evaluation also described substan- 
tial variation in how well schools were able to 
support the use of DLC resources. The findings 
helped program administrators who, in response, 
stepped up their efforts to train educators about 
the Web portal’s resources and how to use it 
with students. The evaluation also was a jump- 
ing-off point for future assessments that would 
follow up on the themes of student achievement 
in the online courses and supports for educators 
to help them take advantage of DLC’s offerings. 

In the project’s second year, project leaders 
launched another evaluation. This effort con- 
sisted of student focus groups to identify stu- 
dents’ expectations of DLC’s online courses, stu- 
dent pre-course preparation, overall experience, 
and suggestions for improving the courses and 
providing better support. As a separate effort, 
they also contracted with an independent evalu- 
ator, Cohen Research and Evaluation, to learn 
more about the behavior and motivations of 
students and other users, such as school librar- 
ians, teachers, and administrators. This aspect of 
the evaluation consisted of online surveys with 
students, teachers, and school librarians; and in- 
terviews with selected teachers, librarians, and 



administrators (primarily to help develop survey 
questions). To gain insight into how well stu- 
dents were performing in the classes, the evalu- 
ators analyzed grades and completion rates for 
students enrolled in DLC courses. The evalu- 
ation activities conducted in the second year 
again pointed to the need for more school-level 
support for using DLC resources. The evalua- 
tors found that some schools were excited and 
committed to using the Web site’s resources, 
but were underutilizing it because they lacked 
sufficient structures, such as training and time 
for teachers to learn about its offerings, inter- 
nal communication mechanisms to track student 
progress, and adequate technical support. 

When DLC’s leaders began to contemplate a 
third-year evaluation, they wanted more than 
basic outcome data, such as student grades and 
completion rates. “We can count how many 
courses, we know the favorite subjects, and we 
know the grade averages and all of that,” says 
Judy Margrath-Huge, DLC president and chief ex- 
ecutive officer. What they needed, she explains, 
was to get at the “so what,” meaning they wanted 
to understand “what difference [DLC] makes.” 

The evaluation team knew that if its effort were 
to produce reliable information about DLC’s in- 
fluence on student achievement, it would need 
to zero in on one, or just a few, of the Web 
site’s many components. Some features — DLC’s 
vast digital library, for example — simply were 
not good candidates for the kind of study they 
planned to conduct. As Karl Nelson, DLC’s di- 
rector of technology and operations, explains, 
“It is very difficult to evaluate the effectiveness 
of and to answer a ‘so what’ question about a 
library database, for example. It’s just hard to 
point to a kid using a library database and then 



a test score going up.” Ultimately, says Nelson, 
DLC’s leaders chose to look primarily at the on- 
line courses, believing that this was the feature 
they could best evaluate. 

DLC leaders hired outside evaluators, Fouts & 
Associates, to help them drill down into a spe- 
cific aspect of student achievement — determin- 
ing the role that DLC online courses play in: 1) 
enabling students to graduate from high school 
and 2) helping students become eligible and ful- 
ly prepared for college. In this evaluation, the 
researchers identified a sample of 115 graduated 
seniors from 17 schools who had completed DLC 
courses. The evaluators visited the schools to 
better understand online course-taking policies 
and graduation requirements and to identify DLC 
courses on the transcripts of these 115 students. 
At the school sites, evaluators interviewed school 
coordinators and examined student achievement 
data, student transcripts, and DLC documents. 

The evaluation gave DLC’s leaders what they 
wanted: concrete evidence of the impact of 
DLC online courses. This study showed that 76 
percent of students who took an online course 
through DLC did so because the class was not 
available at their school and that one-third of the 
students in the study would not have graduated 
without the credits from their online course. This 
and other evaluation findings, show that “we are 
meeting our mission,” says Margrath-Huge. “We 
are accomplishing what we were set out to ac- 
complish. And it’s really important for us to be 
able to stand and deliver those kinds of messages 
with that kind of data behind us.” DLC has used 
its evaluation findings in multiple ways, includ- 
ing when marketing the program to outsiders, 
to demonstrate its range of offerings and its ef- 
fect on students (see fig. 2, Excerpt from Digital 



Learning Commons’ Meeting 21st Century Learn- 
ing Challenges in Washington State, p. 24). 

It would be impossible to conduct a compre- 
hensive evaluation of everything that DLC has 
to offer, but certainly the evaluation strategy 
of combining breadth and depth has given it a 
great deal of useful information. DLC’s leaders 
have used the findings from all the evaluations 
to improve their offerings and to demonstrate 
effectiveness to funders and other stakeholders. 

In Maryland, Thinkport evaluators faced a similar 
challenge in trying to study a vast Web site that 
compiles educational resources for teachers and 
students. At first, the evaluation team from Mac- 
ro International, a research, management, and 
information technology firm, conducted such ac- 
tivities as gathering satisfaction data, reviewing 
Web site content, and documenting how the site 
was used. But over time, project leaders were 
asked by funders to provide more concrete evi- 
dence about Thinkport’s impact on student per- 
formance. The evaluation (and the project itself) 
had to evolve to meet this demand. 

In response, the team decided to “retrofit” the 
evaluation in 2005, settling on a two-part evalu- 
ation that would offer both breadth and depth. 
First, the evaluators surveyed all registered us- 
ers about site usage and satisfaction, and sec- 
ond, they designed a randomized controlled 
trial (see Glossary of Common Evaluation Terms, 
p. 65) to study how one of the site’s most popu- 
lar features — an “electronic field trip” — affected 
students’ learning. Several field trips had been 
developed under this grant; the one selected was 
Pathways to Freedom, about slavery and the Un- 
derground Railroad. This particular product was 
chosen for a number of reasons: most middle 
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Figure 2. Excerpt From Digital Learning Commons' Meeting 21st Century 
Learning Challenges in Washington State* 



Independent research demonstrates 
increased on-time graduation rates 
and college/workforce readiness. 



The results are clear — DLC access 
to online courses increases on-time 
graduation rates at schools studied in 
Washington State. 

When online courses are made 
available through the DLC to 
students who would not otherwise 
have had access to that course — 
whether for purposes of 
remediation, advanced placement, 
or college entrance — it makes a 
significant difference, increasing 
graduation rates and college/ 
workforce readiness. 

Research focused on 
online courses 

The DLC has focused their 
evaluation research on the impact 
from online courses, as outcomes 
and results can be objectively 
gathered and tabulated. Over two 
years worth of data demonstrate 
consistent results. 

2006 Evaluation Results 

In the spring of 2006 researchers 
from Pouts & Associates analyzed 



the transcripts of approximately 
1 1 5 students at seventeen DLC- 
participating high schools across the 
state. Quantitative and qualitative 
data were gathered from transcripts, 
student achievement data, DLC 
documents, and school coordinators 
to identify whether access to online 
courses through the DLC could 
objectively be shown to make a 
difference. 

Online Course Registrations 

When the DLC was launched, 
online course enrollment was 
projected to reach 200 students. 
During the 2004-05 school year 
alone, however, 1,159 students 
from forty-two high schools took 
an online course. So, what courses 
are students taking online? Our data 
indicate significant growth in foreign 
languages over the last year. Our 
2004-05 statistics on enrollment in 
advanced courscwork are consistent 



1. INCREASED GRADUATION 
RATES: Of the 115 students 
who graduated, approximately 
33% would NOT have graduated 
without a course made available 
through the DLC. 

2. COLLEGE AND WORKFORCE 
READINESS: Of the fifty-nine 
students who were college 
eligible, thirty-six students 

- 61% - took advanced classes 
to better prepare themselves 
for college. 



with those of NCES, which reports that 
14% of enrollments nationally are in 
AP or college-level courses. 
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* The U.S. Department of Education does not mandate or prescribe particular curricula or lesson plans. The information in this figure was provided by the 
identified site or program and is included here as an illustration of only one of many resources that educators may find helpful and use at their option. The 
Department cannot ensure its accuracy. Furthermore, the inclusion of information in this figure does not reflect the relevance, timeliness, or completeness of this 
information; nor is it intended to endorse any views, approaches, products, or services mentioned in the figure. 



school social studies curricula include the top- 
ic; the evaluators had observed the field trip in 
classrooms in an earlier formative study and were 
aware of students’ high interest in its topics and 
activities; and Web site statistics indicated that it 
was a heavily viewed and utilized resource. 

The evaluators gave pre- and posttests of content 
knowledge about slavery and the Underground 
Railroad to students whose teachers used the 
electronic field trip and control groups of stu- 
dents whose teachers used traditional instruction. 
They found that the electronic field trip had a 
very substantial positive impact on student learn- 
ing, particularly among students whose teachers 
had previously used it: These students of experi- 
enced teachers scored 121 percent higher on the 
content knowledge test than the students whose 
teachers used traditional instruction. 

Like the DLC evaluation, the Thinkport evalua- 
tion proved useful both for formative and sum- 
mative purposes. Thinkport’s leaders learned 
that teachers who were new to the electronic 
field trip needed more training and experience 
to successfully incorporate it into their class- 
rooms. They also learned that once teachers 
knew how to use the tool, their students learned 
the unit’s content far better than their peers in 
traditional classes. The evaluators’ two-part plan 
gave Thinkport’s leaders what they needed: 
broad information about usage and satisfaction 
and credible evidence that a frequently used 
feature has a real impact on students. 

Use Multiple Methods to Capture Wide-ranging 
Student Experiences in Online Courses 

In the examples above, evaluators struggled to 
wrap their arms around sprawling Web resourc- 
es that lacked uniformity. Sometimes a similar 



challenge is found at the micro level, as when 
students have heterogeneous experiences in 
the same online class. The leaders of Arizona’s 
AZVA struggled with this problem when they set 
out to evaluate one of their online courses. 

In 2006, AZVA school leaders began to ex- 
periment with hybrid courses — regular online 
classes supplemented with weekly face-to-face 
lessons from a classroom teacher. The in-per- 
son component was originally designed in a 
very structured way: Students received class- 
room instruction every week at a specific time 
and location, and they had to commit to this 
weekly instruction for an entire semester. In ad- 
dition, students could participate in the hybrid 
class only if they were working either on grade 
level or no more than one level below grade 
level. These restrictions allowed the face-to-face 
teachers to offer the same lessons to all students 
during the weekly session. School leaders spe- 
cifically designed this structure to bring unifor- 
mity to students’ experiences and make it easier 
to evaluate the class. As AZVA’s director, Mary 
Gifford, explains, “We wanted the hybrid expe- 
rience to be the same for all the kids so we 
could actually determine whether or not it is 
increasing student achievement.” 

However, when program leaders surveyed par- 
ents at the semester break, Gifford says, “Par- 
ents offered some very specific feedback.” They 
didn’t like the semester-long, once-a-week com- 
mitment, and they argued that the structure 
prevented students from working at their own 
pace. Instead, parents wanted a drop-in model 
that would offer students more flexibility and 
tailored assistance. In response, she says, “We 
totally overhauled the course for the second se- 
mester and made it a different kind of a model.” 
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In the new format, teachers do not deliver pre- 
pared lessons, but, instead, work with students 
one-on-one or in small groups on any topic with 
which a student is struggling. 

While the flexibility of the new model meets 
student needs, students naturally will have more 
varied experiences using it and school leaders 
will not be able to isolate its effect on student 
achievement. In other words, observed gains 
could be due to any number of factors, such as 
how frequently the student drops in, whether 
a student works one-on-one with a teacher or 
in a small group, and what content is covered 
during the drop-in session. In this instance, the 
needs of program participants necessarily out- 
weighed those of the evaluation. 

However, because the school has a number of 
other data collection efforts in place, Gifford and 
her colleagues will still be able to gather infor- 
mation about whether the hybrid model is help- 
ing students. School administrators track how 
frequently students attend the drop-in class and 
chart students’ academic progress through the 
curriculum both before and after they partici- 
pate in the hybrid program. AZVA also separate- 
ly examines state test scores for students who 
attend the hybrid program on a regular basis. In 
addition, AZVA frequently uses surveys of par- 
ents, students, and teachers to gather informa- 
tion about the effectiveness of many aspects of 
their program, including the hybrid class. These 
kinds of activities can provide important insights 
when controlled studies are impossible. 

Summary 

Though multifaceted resources can make it dif- 
ficult for evaluators to gauge effectiveness, good 
evaluations — especially those using multiple, 



complementary research methods — can identify 
the circumstances under which the program or 
resource is most likely to succeed or fail and can 
generate useful recommendations for strength- 
ening weak points. Evaluators who are studying 
multifaceted resources should consider a strat- 
egy that combines both breadth and depth. 

If studying an educational Web site that offers an 
array of resources, evaluators might collect broad 
information about site usage and then select one 
or two particular features to examine in more 
depth. Program leaders can facilitate this process 
by clearly articulating what each resource is in- 
tended to do, or what outcomes they would hope 
to see if the resource was being used effectively. 
From this list, program leaders and evaluators 
can work together to determine what to study 
and how. In some instances, it might be logical 
to design a multiyear evaluation that focuses on 
distinct program components in different years, 
or collects broad information in the first year, and 
narrows in focus in subsequent years. 

If evaluating a particular course or resource that 
offers students a wide range of experiences, 
evaluators might consider using a mix of quanti- 
tative and qualitative methods to provide a well- 
rounded assessment of it. Rich, descriptive in- 
formation about students’ experiences with the 
course or resource can be useful when trying to 
interpret data about student outcomes. 

Finding Appropriate Comparison 
Groups 

When evaluators have research questions about 
an online program’s impact on student achieve- 
ment, their best strategy for answering those 



questions is often an experimental design, like 
a randomized controlled trial, or a quasi-experi- 
mental design that requires them to find matched 
comparison groups. These two designs are the 
best, most widely accepted methods for deter- 
mining program effects (see table 2, p. 28). 

Evaluators who use these methods must proceed 
carefully. They must first ensure that compari- 
sons are appropriate, taking into consideration 
the population served and the program’s goals 
and structure. For example, online programs 
dedicated to credit recovery would want to com- 
pare their student outcomes with those of other 
credit recovery programs because they are serv- 
ing similiar populations. Evaluators must also be 
scrupulous in their efforts to find good compari- 
son groups — often a challenging task. For a par- 
ticular online class, there may be no correspond- 
ing face-to-face class. Or it may be difficult to 
avoid a self-selection bias if students (or teach- 
ers) have chosen to participate in an online pro- 
gram. Or the online program might serve a wide 
range of students, possibly from multiple states, 
with a broad range of ages, or with different lev- 
els of preparation, making it difficult to compare 
to a traditional setting or another online program. 
Evaluators attempting to conduct a randomized 
controlled trial can encounter an even greater 
challenge in devising a way to randomly assign 
students to receive a treatment (see Glossary of 
Common Evaluation Terms, p. 65) online. 

How might evaluators work around these com- 
parison group difficulties? 

Several of the evaluations featured in this guide 
sought to compare distance learning programs 
with face-to-face learning settings, and they 
took various approaches to the inherent techni- 



cal challenges of doing so. Despite some dif- 
ficulties along the way, the evaluators of online 
projects in Louisiana, Alabama, and Maryland all 
successfully conducted comparative studies that 
yielded important findings for their programs. 

Identify Well-matched Control Groups for 
Quosl-experlmentol Studies 

In designing the evaluation for Louisiana’s Alge- 
bra I Online, program leaders in the Louisiana 
Department of Education wanted to address the 
bottom line they knew policymakers cared about 
most: whether students in the program’s online 
courses were performing as well as students 
studying algebra in a traditional classroom. 

To implement the quasi-experimental design of 
their evaluation (see Glossary of Common Evalu- 
ation Terms, p. 65), the program’s administrators 
needed to identify traditional algebra classrooms 
to use as controls. The idea was to identify stan- 
dard algebra courses serving students similar to 
those participating in the Algebra I Online pro- 
gram and then give pre- and post-course tests to 
both groups to compare how much each group 
learned, on average. Of course, in a design such 
as this, a number of factors besides the class 
format (online or face-to-face) could affect stu- 
dents’ performance: student-level factors, such as 
individual ability and home environment; teach- 
er-level factors, such as experience and skill in 
teaching algebra; and school-level factors, such 
as out-of-class academic supports for students. 
Clearly, it was important to the evaluation to find 
the closest matches possible. 

Under a tight timeline, program administrators 
worked with all their participating districts to 
identify traditional classrooms that were matched 
demographically to the online classes and, then, 
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Table 2. Evaluation Design Characteristics 



Design 


Characteristics 


Advantages 


Disadvantages 


Experimental 

design 


Incorporates random assignment of participants 
to treatment and control groups. The purpose of 
randomization is to ensure that all possible ex- 
planations for changes (measured and unmea- 
surable) in outcomes are taken into account, 
randomly distributing participants in both the 
treatment and control groups so there should be 
no systematic baseline differences. Treatment 
and control groups are compared on outcome 
measures. Any differences in outcomes may be 
assumed to be attributable to the intervention. 


Most sound 
or valid study 
design available 
Most accepted 
in scientific 
community 


Institutional policy guidelines 
may make random assignment 
impossible. 


Quasi- 

experimental 

design 


Involves developing a treatment group and 
a carefully matched comparison group (or 
groups). Differences in outcomes between the 
treatment and comparison groups are ana- 
lyzed, controlling for baseline differences be- 
tween them on background characteristics and 
variables of interest. 


More practical in 
most educational 
settings 

Widely accepted 
in scientific 
community 


Finding and choosing suitable 
treatment and comparison groups 
can be difficult. 

Because of nonrandom group 
assignment, the outcomes of inter- 
est in the study may have been in- 
fluenced not only by the treatment 
but also by variables not studied. 



Source: Adapted from U.S. Department of Education, Mobilizing for Evidence-Based Character Education (2007). Available from http://www.ed.gov/programs/charactered/ 
mobilizing.pdf. 



administered a pretest of general mathematics 
ability to students in both sets of classes. External 
evaluators from the Education Development Cen- 
ter (EDC) joined the effort at this point. Although 
impressed by the work that the program staff had 
accomplished given time and budget constraints, 
the EDC evaluators were concerned about the 
quality of matches between the treatment and 
control classes. Finding good matches is a diffi- 
cult task under the best of circumstances and, in 
this case, it proved even more difficult for non- 
evaluators, that is, program and school adminis- 
trators. The quality of the matches was especially 
problematic in small districts or nonpublic schools 
that had fewer control classrooms from which to 
choose. EDC evaluator Rebecca Carey says, “To 
their credit, [the program’s administrators] did the 



best they could in the amount of time they had.” 
Still, she adds, “the matches weren't necessarily 
that great across the control and the experimen- 
tal. . . . Had we been involved from the beginning, 
we might have been a little bit more stringent 
about how the control schools would match to 
the intervention schools and, maybe, have made 
the selection process a little bit more rigorous.” 

Ultimately, the evaluation team used students’ 
pretest scores to gauge whether the treatment 
and control group students started the course 
with comparable skills and knowledge and em- 
ployed advanced statistical techniques to help 
control for some of the poor matches. 

In their report, the evaluators also provided data 
on differences between the control and treatment 



groups (e.g., student characteristics, state test 
scores in math, size of the school), and they 
drew from other data sources (including surveys 
and observations) to triangulate their findings 
(see Common Problems When Comparing On- 
line Programs to Face-to-Face Programs, p. 30). 
To other programs considering a comparative 
study, the Algebra I Online evaluators recom- 
mend involving the evaluation team early in the 
planning process and having them supervise the 
matching of treatment and control groups. 

The evaluators of Alabama’s Alabama Connect- 
ing Classrooms, Educators, & Students Statewide 
Distance Learning (ACCESS) initiative similarly 
planned a quasi-experimental design and needed 
traditional classes to use as matches. ACCESS pro- 
vides a wide range of distance courses, including 
core courses, electives, remedial courses, and 
advanced courses, which are either Web-based, 
utilize interactive videoconferencing (IVC) plat- 
forms, or use a combination of both technolo- 
gies. In the case of IVC courses, distance learners 
receive instruction from a teacher who is deliver- 
ing a face-to-face class at one location while the 
distance learners participate from afar. 

When external evaluators set out to compare the 
achievement of ACCESS’S IVC students to that of 
students in traditional classrooms, they decided 
to take advantage of the program’s distinctive 
format. As controls, they used the classrooms 
where the instruction was delivered live by the 
same instructor. In other words, the students at 
the site receiving the IVC feed were considered 
the treatment group, and students at the sending 
site were the control group. This design helped 
evaluators to isolate the effect of the class format 
(IVC or face-to-face) and to avoid capturing the 
effects of differences in style and skill among 



teachers, a problem they would have had if the 
treatment and control classes were taught by 
different people. Martha Donaldson, ACCESS’S 
lead program administrator, says, “We were 
looking to see if it makes a difference whether 
students are face-to-face with the teacher or if 
they’re receiving instruction in another part of 
the state via the distance learning equipment.” 
To compare performance between the two 
groups of students, evaluators gathered a range 
of data, including grades, scores on Advanced 
Placement tests, if relevant, and enrollment and 
dropout data. The design had some added logis- 
tical benefits for the evaluators: it was easier to 
have the control classrooms come from schools 
participating in ACCESS, rather than having to 
collect data from people who were unfamiliar 
with the program. 

Despite these benefits, the strategy of using IVC 
sending sites as control groups did have a few 
drawbacks. For instance, the evaluators were 
not able to match treatment and control groups 
on characteristics that might be important, such 
as student- and school-level factors. It is pos- 
sible that students in the receiving sites attend- 
ed schools with fewer resources, for example, 
and the comparison had no way to control for 
that. For these reasons, the ACCESS evaluators 
ultimately chose not to repeat the comparison 
between IVC sending and receiving sites the 
following year. They did, however, suggest that 
such a comparison could be strengthened by 
collecting data that gives some indication of 
students’ pretreatment ability level — GPA, for 
example — and using statistical techniques to 
control for differences. Another strategy, they 
propose, might be to limit the study to such a 
subject area as math or foreign language, where 
courses follow a sequence and students in the 
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Common Problems When Comparing Online Programs to Face- 
to-Face Programs 

Evaluators need to consider both student- and school- or classroom-level variables when they compare online and 
face-to-face programs. At the student level, many online program participants enroll because of a particular cir- 
cumstance or attribute, and thus they cannot be randomly assigned— for example, a student who takes an online 
course over the summer to catch up on credits. The inherent selection bias makes it problematic to compare the 
results of online and face-to-face students. Evaluators' best response is to find, wherever possible, control groups 
that are matched as closely as possible to the treatment groups; this includes matching for student demographic 
characteristics; their reason for taking the course (e.g., credit recovery); and their achievement level. Classroom- 
and school-level factors complicate comparisons as well. If the online program is more prevalent in certain types 
of schools (e.g., rural schools) or classrooms (e.g., those lacking a fully certified teacher), then the comparison 
unintentionally can capture the effects of these differences. Evaluators need to understand and account for these 
factors when selecting control groups. 



same course (whether online or traditional) 
would have roughly similar levels of ability. 

Anticipate the Challenges of Conducting a 
Randomized Controlled Trial 

Evaluators in Maryland had a different challenge 
on their hands when they were engaged to study 
Thinkport and its wide-ranging education offer- 
ings — including lesson plans, student activities, 
podcasts,* video clips, blogs,** learning games, 
and information about how all of these things 
can be used effectively in classrooms. When 
asked by a key funder to evaluate the program’s 
impact on student learning, the evaluation team 
chose to study one of Thinkport’s most pop- 
ular features, its collection of “electronic field 
trips,” each one a self-contained curricular unit 
that includes rich multimedia content (delivered 

* Podcasts are audio files that are distributed via the Internet, which can be played 
back on computers to augment classroom lessons. 

** Blogs are regularly updated Web sites that usually provide ongoing information on 
a particular topic or serve as personal diaries, and allow readers to leave their own 
comments. 



online) and accompanying teacher support ma- 
terials that assist with standards alignment and 
lesson planning. 

In collaboration with its evaluation partner, 
Macro International, the program’s parent orga- 
nization, Maryland Public Television, set out to 
study how the Pathways to Freedom electronic 
field trip impacted student learning in the class- 
room and whether it added value. Rather than 
conducting a quasi-experimental study in which 
the evaluator would have to find control groups 
that demographically matched existing groups 
that were receiving a treatment, the Thinkport 
evaluators wanted an experimental design study 
in which students were assigned randomly to 
either treatment or control groups. 

Although they knew it would require some extra 
planning and coordination, the evaluators chose 
to conduct a randomized controlled trial. This de- 
sign could provide them with the strongest and 
most reliable measure of the program’s effects. 
But first, there were challenges to overcome. If 




students in the treatment and control groups were 
in the same classroom, evaluators thought, they 
might share information about the field trip and 
“contaminate” the experiment. Even having treat- 
ment and control groups in the same school could 
cause problems: In addition to the possibility of 
contamination, the evaluators were concerned 
that teachers and students in control classrooms 
would feel cheated by not having access to the 
field trip and would complain to administrators. 

To overcome these challenges and maintain the 
rigor of the experimental design, program lead- 
ers decided to randomize at the school level. 
They recruited nine schools in two districts and 
involved all eighth-grade social studies teachers 
in each school, a total of 23 teachers. The evalu- 
ators then matched the schools based on student 
demographics, teacher data, and student scores 
on the state assessment. (One small school was 
coupled with another that matched it demo- 
graphically, and the two schools were counted 
as one.) The evaluators then randomly identified 
one school in each pair as a treatment school 
and one as a control school. Teachers did not 
know until training day whether they had been 
selected as a treatment or control. (The control 
group teachers were told that they would be 
given an orientation and would be able to use 
the electronic field trip after the study.) 

A second challenge for the evaluation team was 
ensuring that teachers in the control classrooms 
covered the same content as the teachers who 
were using the electronic field trip — a problem 
that might also be found in quasi-experimental 
designs that require matched comparison groups. 
They were concerned because the electronic 
field trip devotes six class periods to the topic of 
slavery and the Underground Railroad — perhaps 



more time than is typical in a regular classroom. 
To ensure that students in both groups would 
spend a similar amount of time on the Under- 
ground Railroad unit and have varied resources to 
use, the evaluators provided additional curricular 
materials to the control teachers, including books, 
DVDs, and other supplemental materials. On each 
of the six days they delivered the unit, all teach- 
ers in the study completed forms to identify the 
standards they were covering, and they also com- 
pleted a form at the end of the study to provide 
general information about their lessons. During 
the course of the unit, the evaluators found that 
the control teachers began working together to 
pool their resources and develop lesson plans. 
The evaluators did not discourage this interaction, 
believing that it increased the control teachers’ 
ability to deliver the content effectively and, ul- 
timately, added credibility to the study. To com- 
pare how well students learned the content of the 
unit, the evaluators assessed students’ knowledge 
of slavery and the Underground Railroad before 
and after the instructional unit was delivered. 

The evaluators’ approach to these challeng- 
es was thoughtful and effective. By balancing 
the experimental design concept with practical 
considerations, the evaluation team was able to 
get the information they wanted and success- 
fully complete the study. 

Summary 

The evaluations of Algebra I Online, ACCESS, 
and Thinkport illustrate a variety of approaches 
to constructing comparative studies. For program 
leaders who are considering an evaluation that 
will compare the performance of online and tra- 
ditional students, there are several important con- 
siderations. First, program leaders should work 
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Highlights From the Three Comparative Analyses Featured in 
This Section 

The comparative analyses described in this section produced a number of important findings for program staff and 
leaders to consider. 

Algebra I Online (Louisiana). A quasi-experimental study that compared students' performance on a posttest of 
algebra content knowledge showed that students who participated in the Algebra I Online course performed at least 
as well as those who participated in the traditional algebra I course, and on average outscored them on 1 8 of the 
25 test items. In addition, the data suggested that students in the online program tended to do better than control 
students on those items that required them to create an algebraic expression from a real-world example. A major- 
ity of students in both groups reported having a good or satisfactory learning experience in their algebra course, 
but online students were more likely to report not having a good experience and were less likely to report feeling 
confident in their algebra skills. Online students reported spending more time interacting with other students about 
the math content of the course or working together on course activities than their peers in traditional algebra class- 
rooms; the amount of fime they spend socializing, interacting to understand assignment directions, and working 
together on in-class assignments or homework was about the same. The evaluation also suggested that teacher 
teams that used small group work and had frequent communication with each other were the most successful. 

ACCESS (Alabama). Overall, the evaluations of the program found that ACCESS was succeeding in expanding ac- 
cess to a range of courses and was generally well received by users. The comparative analyses suggested that stu- 
dents taking Advanced Placement (AP) courses from a distance were almost as likely to receive a passing course 
grade as those students who received instruction in person, and showed that both students and faculty found the 
educational experience in the distance courses was equal to or better than that of traditional, face-to-face courses. 
The evaluation also found that in the fall semester of 2006, the distance course dropout rate was significantly lower 
than nationally reported averages. 

Thinkport (Maryland). The randomized controlled trial initially revealed that, compared to traditional instruction, the 
online field trip did not have a significant positive or negative impact on student learning. However, further analysis 
revealed that teachers using the electronic field frip for the first time actually had less impact on student learning than 
those using traditional instruction, while teachers who had used the electronic field trip before had a significantly 
more positive impact. In a second phase of the study, the evaluators confirmed the importance of experience with the 
product: when teachers in one district used the electronic field trip again, they were much more successful on their 
second try, and their students were found to have learned far more than students receiving traditional instruction. 



with an evaluator to ensure that comparisons are 
appropriate. Together, they will want to take into 
account the program’s goals, the student popula- 
tion served, and the program’s structure. 

Second, program leaders should clearly articulate 
the purpose of the comparison. Is the evaluation 
seeking to find out if the online program is just as 



effective as the traditional one, or more effective? 
In some instances, when online programs are be- 
ing used to expand access to courses or teachers, 
for example, a finding of “no significant differ- 
ence” between online and traditional formats can 
be acceptable. In these cases, being clear about 
the purpose of the evaluation ahead of time will 
help manage stakeholders’ expectations. 
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If considering a quasi-experimental design, 
evaluators will want to plan carefully for what 
classes will be used as control groups, and as- 
sess the ways they are different from the treat- 
ment classes. They will want to consider what 
kinds of students the class serves, whether the 
students (or teachers) chose to participate in the 
treatment or control class, whether students are 
taking the treatment and control classes for simi- 
lar reasons (e.g., credit recovery), and whether 
the students in the treatment and control classes 
began the class at a similar achievement level. If 
a randomized controlled trial is desired, evalu- 
ators will need to consider how feasible it is 
for the particular program. Is it possible to ran- 
domly assign students either to receive the treat- 
ment or be in the control group? Can the control 
group students receive the treatment at a future 
date? Will control and treatment students be in 
the same classroom or school, and if so, might 
this cause “contamination” of data? 

Finally, there are a host of practical consider- 
ations if an evaluation will require collecting 
data from control groups. As we describe further 
in the next section, program leaders and evalua- 
tors need to work together to communicate the 
importance of the study to anyone who will col- 
lect data from control group participants, and 
to provide appropriate incentives to both the 
data collectors and the participants. The impor- 
tance of these tasks can hardly be overstated: 
The success of a comparative study hinges on 
having adequate sets of data to compare. 

Solving Data Collection Problems 

Evaluators of any kind of program frequently 
face resistance to data collection efforts. Surveys 



or activity logs can be burdensome, and analyz- 
ing test scores or grades can seem invasive. In 
the online arena, there can be additional obsta- 
cles. The innovative nature of online programs 
can sometimes create problems: Some online 
program administrators simply may be struggling 
to get teachers or students to use a new technol- 
ogy, let alone respond to a questionnaire about 
the experience. And in the context of launching 
a new program or instructional tool, program 
staffers often have little time to spend on such 
data collection matters as tracking survey takers 
or following up with nonrespondents. 

Still more difficult is gaining cooperation from 
people who are disconnected from the program 
or evaluation — not uncommon when a new 
high-tech program is launched in a decades-old 
institution. Other difficulties can arise when on- 
line programs serve students in more than one 
school district or state. Collecting test scores or 
attendance data, for example, from multiple bu- 
reaucracies can impose a formidable burden. 
Privacy laws present another common hurdle 
when online evaluators must deal with regula- 
tions in multiple jurisdictions to access secondary 
data. The problem can be compounded when 
officials are unfamiliar with a new program and 
do not understand the evaluation goals. 

When faced with these data collection challeng- 
es, how should evaluators respond, and how 
can they avoid such problems in the first place? 

Among the evaluations featured in this guide, 
data collection problems were common. When 
study participants did not cooperate with data 
collection efforts, the evaluators of Chicago Pub- 
lic Schools’ Virtual High School and Louisiana’s 
Algebra I Online program handled the prob- 
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lem by redesigning their evaluations. Thinkport 
evaluators headed off the same problem by tak- 
ing proactive steps to ensure cooperation and 
obtain high data collection rates in their study. 
When evaluators of Washington’s Digital Learn- 
ing Commons (DLC) struggled to collect and 
aggregate data across multiple private vendors 
who provided courses through the program, 
the program took steps to define indicators and 
improve future evaluation efforts. As these ex- 
amples show, each of these challenges can be 
lessened with advance planning and communi- 
cation. While these steps can be time-consum- 
ing and difficult, they are essential for collecting 
the data needed to improve programs. 

Redesign or Refocus the Evaluation When 
Necessary 

In 2004, Chicago Public Schools (CPS) estab- 
lished the Virtual High School (CPS/VHS) to 
provide students with access to a wide range of 
online courses taught by credentialed teachers. 
The program seemed like an economical way to 
meet several district goals: Providing all students 
with highly qualified teachers; expanding access 
to a wide range of courses, especially for tra- 
ditionally underserved students; and addressing 
the problem of low enrollment in certain high 
school courses. Concerned about their students’ 
course completion rates, CPS/VHS administra- 
tors wanted to learn how to strengthen student 
preparedness and performance in their program. 
Seeing student readiness as critical to a success- 
ful online learning experience, the project’s in- 
dependent evaluator, TA Consulting, and district 
administrators focused on student orientation 
and support; in particular, they wanted to assess 
the effectiveness of a tutorial tool developed to 
orient students to online course taking. 



At first, evaluators wanted a random selection 
of students assigned to participate in the orien- 
tation tutorial in order to create treatment and 
control groups (see Glossary of Common Evalu- 
ation Terms, p. 65) for an experimental study. 
While the district approved of the random as- 
signment plan, many school sites were not fa- 
miliar with the evaluation and did not follow 
through on getting students to take the tutorial 
or on tracking who did take it. 

To address this problem, the researchers changed 
course midway through the study, refocusing it 
on the preparedness of in-class mentors who sup- 
ported the online courses. This change kept the 
focus on student support and preparation, but 
sidestepped the problem of assigning students 
randomly to the tutorial. In the revised design, 
evaluators collected data directly from partici- 
pating students and mentors, gathering informa- 
tion about students’ ability to manage time, the 
amount of on-task time students needed for suc- 
cess, and the level of student and mentor tech- 
nology skills. The evaluators conducted surveys 
and focus groups with participants, as well as 
interviews with the administrators of CPS/VHS 
and Illinois Virtual High School (IVHS), the um- 
brella organization that provides online courses 
through CPS/VHS and other districts in the state. 
They also collected data on student grades and 
participation in orientation activities. To retain a 
comparative element in the study, evaluators ana- 
lyzed online course completion data for the entire 
state: They found that CPS/VHS course comple- 
tion rates were 10 to 15 percent lower than com- 
parable rates for all students in IVHS in the fall of 
2004 and spring of 2005, but in the fall of 2005, 
when fewer students enrolled, CPS/VHS showed 
its highest completion rate ever, at 83.6 percent, 
surpassing IVHS’s informal target of 70 percent. 10 



Through these varied efforts, the researchers 
were able to get the information they need- 
ed. When their planned data collection effort 
stalled, they went back to their study goals and 
identified a different indicator for student sup- 
port and a different means of collecting data on 
it. And although the experimental study of the 
student orientation tutorial was abandoned, the 
evaluators ultimately provided useful informa- 
tion about this tool by observing and reporting 
on its use to program administrators. 

Evaluators very commonly have difficulty col- 
lecting data from respondents who do not feel 
personally invested in the evaluation, and online 
program evaluators are no exception. In the case 
of Louisiana’s Algebra I Online program, evalu- 
ators faced problems getting data from control 
group teachers. Initially, the state department of 
education had hoped to conduct a quasi-exper- 
imental study (see Glossary of Common Evalu- 
ation Terms, p. 65) comparing the performance 
of students in the online program with students 
in face-to-face settings. Knowing it might be a 
challenge to find and collect data from control 
groups across the state, the program adminis- 
trators required participating districts to agree 
up front to identify traditional classrooms (with 
student demographics matching those of online 
courses) that would participate in the collection 
of data necessary for ongoing program evalu- 
ation. It was a proactive move, but even with 
this agreement in place, the external evaluator 
found it difficult to get control teachers to ad- 
minister posttests at the end of their courses. 
The control teachers had been identified, but 
they were far removed from the program and its 
evaluation, and many had valid concerns about 
giving up a day of instruction to issue the test. 
In the end, many of the students in the compari- 



son classrooms did not complete posttests; only 
about 64 percent of control students were tested 
compared to 89 percent of online students. In 
2005, hurricanes Katrina and Rita created addi- 
tional problems, as many of the control group 
classrooms were scattered and data were lost. 

In neither the Chicago nor the Louisiana case 
could the problem of collecting data from 
unmotivated respondents be tackled head-on, 
with incentives or redoubled efforts to follow 
up with nonrespondents, for example. (As is of- 
ten the case, such efforts were prohibited by the 
projects’ budgets.) Instead, evaluators turned to 
other, more readily available data. When plan- 
ning an evaluation, evaluators are wise to try to 
anticipate likely response rates and patterns and 
to develop a “Plan B” in case data collection ef- 
forts do not go as planned. In some cases, the 
best approach may be to minimize or eliminate 
any assessments that are unique to the evalua- 
tion and rely instead on existing state or district 
assessment data that can be collected without 
burdening students or teachers participating in 
control groups. The Federal Education Rights 
and Privacy Act (FERPA) allows districts and 
schools to release student records to a third par- 
ty for the purpose of evaluations. 11 

Both the Chicago and Louisiana examples serve 
as cautionary tales for evaluators who plan to 
undertake an experimental design. If evalua- 
tors plan to collect data from those who do not 
see themselves benefiting from the program or 
evaluation, the evaluation will need adequate 
money to provide significant incentives or, at a 
minimum, to spend substantial time and effort on 
communication with these individuals. 
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Take Proactive Steps to Boost Response Rates 

In the evaluation of Thinkport’s electronic field 
trip, Maryland Public Television was more suc- 
cessful in collecting data from control group 
classrooms. This program’s evaluation efforts 
differed from others in that it did not rely on 
participation from parties outside of the pro- 
gram. Instead, evaluators first chose participating 
schools, then assigned them to either a treatment 
group that participated in the electronic field trip 
right away or to a control group that could use 
the lessons in a later semester. Teachers in both 
groups received a one-hour introduction to the 
study, where they learned about what content 
they were expected to cover with their students, 
and about the data collection activities in which 
they were expected to participate. Teachers were 
informed on that day whether their classroom 
would be a treatment or control classroom; then 
control teachers were dismissed, and treatment 
teachers received an additional two-hour train- 
ing on how to use the electronic field trip with 
their students. Control teachers were given the 
option of receiving the training at the conclusion 
of the study, so they could use the tool in their 
classrooms at a future point. As an incentive, the 
program offered approximately $2,000 to the so- 
cial studies departments that housed both the 
treatment and control classes. 

Evaluators took other steps that may have paved 
the way for strong compliance among the control 
group teachers. They developed enthusiasm for 
the project at the ground level by first approach- 
ing social studies coordinators who became ad- 
vocates for and facilitators of the project. By ap- 
proaching these content experts, the evaluators 
were better able to promote the advantages of 
participating in the program and its evaluation. 



Evaluators reported that coordinators were ea- 
ger to have their teachers receive training on a 
new classroom tool, especially given the reputa- 
tion of Maryland Public Television. In turn, the 
social studies coordinators presented informa- 
tion about the study to social studies teachers 
during opening meetings and served as contacts 
for interested teachers. The approach worked: 
Evaluators were largely able to meet their goal 
of having all eighth-grade social studies teachers 
from nine schools participate, rather than having 
teachers scattered throughout a larger number 
of schools. Besides having logistical benefits for 
the evaluators, this accomplishment may have 
boosted compliance with the study’s require- 
ments among participating teachers. 

Evaluators also took the time to convince teach- 
ers of the importance of the electronic field 
trip. Before meeting with teachers, evaluators 
mapped the field trip’s academic content to the 
specific standards of the state and participating 
schools’ counties. They could then say to teach- 
ers: “These are the things your students have to 
do, and this is how the electronic field trip helps 
them do it.” Finally, evaluators spent time ex- 
plaining the purpose and benefits of the evalu- 
ation itself and communicating to teachers that 
they played an important role in discovering 
whether a new concept worked. 

The overall approach was even more success- 
ful than the evaluators anticipated and helped 
garner commitment among the participating 
teachers, including those assigned to the control 
group. The teachers fulfilled the data collection 
expectations outlined at the onset of the study, 
including keeping daily reports for the six days of 
the instructional unit being evaluated and com- 
pleting forms about the standards that they were 



teaching, as well as documenting general infor- 
mation about their lessons. By involving control 
teachers in the experimental process and giving 
them full access to the treatment, the evaluators 
could collect the data required to complete the 
comparative analysis they planned. 

Besides facing the challenge of collecting data 
from control groups, many evaluators, wheth- 
er of online programs or not, struggle even to 
collect information from regular program par- 
ticipants. Cognizant of this problem, evaluators 
are always looking for ways to collect data that 
are unobtrusive and easy for the respondent to 
supply. This is an area where online programs 
can actually present some advantages. For ex- 
ample, Appleton eSchool evaluators have found 
that online courses offer an ideal opportunity to 
collect survey data from participants. In some 
instances, they have required that students com- 
plete surveys (or remind their parents to do 
so) before they can take the final exam for the 
course. Surveys are e-mailed directly to students 
or parents, ensuring high response rates and 
eliminating the need to enter data into a data- 
base. Appleton’s system keeps individual results 
anonymous but does allow evaluators to see 
which individuals have responded. 

With the help of its curriculum provider, K12 
Inc., the Arizona Virtual Academy (AZVA) also 
frequently uses Web-based surveys to gather 
feedback from parents or teachers about particu- 
lar events or trainings they have offered. Surveys 
are kept short and are e-mailed to respondents 
immediately after the event. AZVA administrators 
believe the burden on respondents is relatively 
minimal, and report that the strategy consistent- 
ly leads to response rates of about 60 percent. 
The ease of AZVA’s survey strategy encourages 



program staff to survey frequently. Furthermore, 
K12 Inc. compiles the results in Microsoft Excel, 
making them easy for any staff member to read 
or merge into a PowerPoint presentation (see 
fig. 3, Example of Tabulated Results From an 
Arizona Virtual Academy Online Parent Survey, 
p. 38). Because the school receives anonymous 
data from K12 Inc., parents know they may be 
candid in their comments. With immediate, easy- 
to-understand feedback, AZVA staff are able to 
fine-tune their program on a regular basis. 

Web sites and online courses can offer other 
opportunities to collect important information 
with no burden on the participant. For example, 
evaluators could analyze the different pathways 
users take as they navigate through a particu- 
lar online tool or Web site, or how much time 
is spent on different portions of a Web site or 
online course. If users are participants in a pro- 
gram and are asked to enter an identifying num- 
ber when they sign on to the site, this type of 
information could also be linked to other data 
on participants, such as school records. 

Define Data Elements Across Many Sources 

Another common problem for online evalua- 
tors is the challenge of collecting and aggregat- 
ing data from multiple sources. As noted earlier, 
Washington state’s DLC offers a wide array of 
online resources for students and educators. To 
understand how online courses contribute to 
students’ progress toward a high school diploma 
and college readiness, the program evaluators 
conducted site visits to several high schools of- 
fering courses through DLC. In reviewing stu- 
dent transcripts, however, the evaluators discov- 
ered that schools did not have the same practices 
for maintaining course completion data and that 
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Figure 3. Example of Tabulated Results From an Arizona Virtual Academy 
Online Parent Survey About the Quality of a Recent Student Workshop* 



Title I Student Workshop Survey— Phoenix 

At this time 12 responses have been received. 100% returned, 12 surveys deployed via e-mail. 



Please select all that apply regarding the Phoenix student workshop. 
(Select all that apply) 


Total 

Count 


% of 
Total 


My child enjoyed the student workshop. 


10 


83% 


My child enjoyed socializing with peers at the workshop. 


12 


100% 


The activities were varied. 


10 


83% 


My child enjoyed working with the teachers. 


11 


92% 


My child had a chance to learn some new math strategies. 


8 


67% 


My child had the opportunity to review math skills. 


10 


83% 


The activities were too hard. 


0 


0% 


The activities were too easy. 


1 


8% 



Please use the space below to let us know your thoughts about the student workshop. (Text limited to 250 characters.) 
(Text input) 


Recipient 


Response 




All 3 math teachers were very nice and understanding and they did not yell at you like the brick and mor- 
tar schools, which made learning fun. 




My son did not want to go when 1 took him. When 1 picked him up he told me he was glad he went. 




She was bored, "it was stuff she already knew," she stated. She liked the toothpick activity. She could 
have been challenged more. She LOVED lunch! 




My child really enjoyed the chance to work in a group setting. She seemed to learn a lot and the games 
helped things "click." 




She was happy she went and said she learned a lot and had lots of fun. That is what 1 was hoping for 
when 1 enrolled her. She is happy and better off for going. 




THANKS FOR THE WORKSHOP NEED MORE OF THEM 




The teachers were fun, smart and awesome, and the children learn new math strategies. 




Thank you keep up the good work. 



Source: Arizona Virtual Academy 



* The U.S. Department of Education does not mandate or prescribe particular curricula or lesson plans. The information in this figure was provided by the 
identified site or program and is included here as an illustration of only one of many resources that educators may find helpful and use at their option. The 
Department cannot ensure its accuracy. Furthermore, the inclusion of information in this figure does not reflect the relevance, timeliness, or completeness of 
this information; nor is it intended to endorse any views, approaches, products, or services mentioned in the figure. 




DLC courses were awarded varying numbers 
of credits at different schools — the same DLC 
course might earn a student .5 credit, 1 credit, 
1.5 credits, or 2 credits. This lack of consistency 
made it difficult to aggregate data across sites 
and to determine the extent to which DLC cours- 
es helped students to graduate from high school 
or complete a college preparation curriculum. 

The evaluation final report recommended de- 
veloping guidelines to show schools how to 
properly gather student-level data on each par- 
ticipating DLC student in order to help with 
future evaluations and local assessment of the 
program’s impact. 

Plan Ahead When Handling Sensitive Data 

Any time evaluators are dealing with student data, 
there are certain to be concerns about privacy. 
Districts are well aware of this issue, and most 
have guidelines and data request processes in 
place to make sure that student data are handled 
properly. Of course, it is critically important to 
protect student privacy, but for evaluators, these 
regulations can create difficulties. For example, 
in Chicago, Tom Clark, of TA Consulting, had 
a close partnership with the district’s Office of 
Technology Services, yet still had difficulty navi- 
gating through the district’s privacy regulations 
and lengthy data request process. Ultimately, 
there was no easy solution to the problem: This 
evaluator did not get all of the data he wanted 
and had to make do with less. Still, he did receive 
a limited amount of coded student demographic 
and performance data and, by combining it with 
information from his other data collection activi- 
ties, he was able to complete the study. 

District protocols are just one layer of protec- 
tion for sensitive student data; researchers also 



must abide by privacy laws and regulations at 
the state and federal levels. These protections 
can present obstacles for program evaluations, 
either by limiting evaluators’ access to certain 
data sets or by requiring a rigorous or lengthy 
data request process. For some online program 
evaluators, the problem is exacerbated because 
they are studying student performance at mul- 
tiple sites in different jurisdictions. 

Of course, evaluators must adhere to all relevant 
privacy protections. To lessen impacts on the 
study’s progress, researchers should consider 
privacy protections during the design phase and 
budget their time and money accordingly. Some- 
times special arrangements also can be made 
to gain access to sensitive data while still pro- 
tecting student privacy. Researchers might sign 
confidentiality agreements, physically travel to 
a specific site to analyze data (rather than have 
data released to them in electronic or hardcopy 
format), or even employ a neutral third party to 
receive data sets and strip out any identifying 
information before passing it to researchers. 

Evaluators also can incorporate a variety of pre- 
cautions in their own study protocol to protect 
students’ privacy. In the Thinkport evaluation, 
for example, researchers scrupulously avoided 
all contact with students’ names, referring to 
them only through a student identification num- 
ber. The teachers participating in the study took 
attendance on a special form that included name 
and student identification number, then tore the 
names off the perforated page and mailed the at- 
tendance sheet to the evaluators. Through such 
procedures, evaluators can maintain students’ 
privacy while still conducting the research need- 
ed to improve programs and instructional tools. 
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Summary 

As these examples show, program leaders and 
evaluators can often take proactive steps to avoid 
and address data collection problems. To ensure 
cooperation from study participants and boost 
data collection rates, they should build in plenty 
of time (and funds, if necessary) to communi- 
cate their evaluation goals with anyone in charge 
of collecting data from control groups. Program 
leaders also should communicate with students 
participating in evaluation studies, both to ex- 
plain its goals, and to describe how students will 
benefit ultimately from the evaluation. These 
communication efforts should begin early and 
continue for the project’s duration. Additionally, 
program leaders should plan to offer incentives 
to data collectors and study participants. 

Evaluators may want to consider administering 
surveys online, to boost response rates and eas- 
ily compile results. In addition, there may be 
ways they can collect data electronically to bet- 
ter understand how online resources are used 
(e.g., by tracking different pathways users take 
as they navigate through a particular online tool 
or Web site, or how much time is spent on dif- 
ferent portions of a Web site or online course). 

When collecting data from multiple agencies, 
evaluators should consider in advance, the for- 
mat and comparability of the data, making sure 
to define precisely what information should be 
collected and communicating these definitions to 
all those who are collecting and maintaining it. 

Evaluators should research relevant privacy laws 
and guidelines well in advance and build in ad- 
equate time to navigate the process of request- 
ing and collecting student data from states, dis- 
tricts, and schools. They may need to consider 



creative and flexible arrangements with agen- 
cies holding student data, and should incorpo- 
rate privacy protections into their study protocol 
from the beginning, such as referring to students 
only through identification numbers. 

Finally, if data collection problems cannot be 
avoided, and evaluators simply do not have 
what they need to complete a planned analysis, 
sometimes the best response is to redesign or re- 
focus the evaluation. One strategy involves find- 
ing other sources of data that put more control 
into the hands of evaluators (e.g., observations, 
focus groups). 

Interpreting the Impact of Program 
Maturity 

Online learning programs are often on the cutting 
edge of education reform and, like any new tech- 
nology, may require a period of adaptation. For 
example, a district might try creating a new online 
course and discover some technical glitches when 
students begin to actually use it. Course creators 
also may need to fine-tune content, adjusting how 
it is presented or explained. For their part, students 
who are new to online course taking may need 
some time to get used to the format. Perhaps they 
need to learn new ways of studying or interacting 
with the teacher to be successful. If an evaluation 
is under way as all of this is going on, its findings 
may have more to do with the program’s newness 
than its quality or effectiveness. 

Because the creators of online programs often 
make adjustments to policies and practices while 
perfecting their model, it is ideal to wait until 
the program has had a chance to mature. At 
the same time, online learning programs are of- 



ten under pressure to demonstrate effectiveness 
right away. Although early evaluation efforts 
can provide valuable formative information for 
program improvement, they can sometimes be 
premature for generating reliable findings about 
effectiveness. Worse, if summative evaluations 
(see Glossary of Common Evaluation Terms, 
p. 65) are undertaken too soon and show dis- 
appointing results, they could be damaging to 
programs’ reputations or future chances at fund- 
ing and political support. 

What steps can evaluators and program leaders 
take to interpret appropriately the impact of the 
program’s maturity on evaluation findings? 

Several of the programs featured in this guide 
had evaluation efforts in place early on, some- 
times from the very beginning. In a few cases, 
evaluators found less-than-positive outcomes at 
first and suspected that their findings were relat- 
ed to the program’s lack of maturity. In these in- 
stances, the evaluators needed additional infor- 
mation to confirm their hunch and also needed 
to help program stakeholders understand and 
interpret the negative findings appropriately. 
In the case of Thinkport, for example, evalu- 
ators designed a follow-up evaluation to pro- 
vide more information about the program as it 
matured. In the case of the Algebra I Online 
program, evaluators used multiple measures to 
provide stakeholders with a balanced perspec- 
tive on the program’s effectiveness. 

Conduct Follow-up Analyses for Deeper 
Understanding 

In evaluating the impact of Thinkport’s electronic 
field trip about slavery and the Underground Rail- 
road, evaluators from Macro International con- 
ducted a randomized controlled trial (see Glos- 



sary of Common Evaluation Terms, p. 65), with 
the aim of understanding whether students who 
used this electronic field trip learned as much as 
students who received traditional instruction in 
the same content and did not use the field trip. 

Initially, the randomized controlled trial re- 
vealed a disappointing finding: The electronic 
field trip did not impact student performance 
on a test of content knowledge any more or 
less than traditional instruction. A second phase 
of the study was initiated, however, when the 
evaluators dug deeper and analyzed whether 
teachers who had used an electronic field trip 
before were more successful than those using it 
for the first time. When they disaggregated the 
data, the evaluators found that students whose 
teachers were inexperienced with the electron- 
ic field trip actually learned less compared to 
students who received traditional instruction; 
however, students whose teachers had used the 
electronic field trip before learned more than 
the traditionally taught students. In the second 
semester, the evaluators were able to compare 
the effect of the treatment teachers using the 
electronic field trip for the first time to the effect 
of those same teachers using it a second time. 
This analysis found that when teachers used the 
electronic field trip a second time, its effective- 
ness rose dramatically. The students of teachers 
using the field trip a second time scored 121 
percent higher on a test of knowledge about 
the Underground Railroad than the students in 
the control group (see Glossary of Common 
Evaluation Terms, p. 65) who had received 
traditional instruction on the same content. 

The evaluators also looked at teachers’ respons- 
es to open-ended survey questions to under- 
stand better why second-time users were so 
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much more successful than first-timers. Novice 
users reported that “they did not fully under- 
stand the Web site’s capabilities when they be- 
gan using it in their classes and that they were 
sometimes unable to answer student questions 
because they didn’t understand the resource 
well enough themselves.” 12 On the other hand, 
teachers that had used the electronic field trip 
once before “showed a deeper understanding 
of its resources and had a generally smoother 
experience working with the site.” 

In this instance, teachers needed time to learn 
how to integrate a new learning tool into their 
classrooms. Although not successful at first, the 
teachers who got past the initial learning curve 
eventually became very effective in using the field 
trip to deliver content to students. This finding 
was important for two reasons. First, it suggested 
an area for program improvement: to counter the 
problem of teacher inexperience with the tool, 
the evaluators recommended that the program 
offer additional guidance for first-time users. Sec- 
ond, it kept the program’s leaders from reaching 
a premature conclusion about the effectiveness 
of the Pathways to Freedom electronic field trip. 

This is an important lesson, say Thinkport’s lead- 
ers, for them and others. As ffelene Jennings, 
vice-president of Macro International, explains, 
“There’s a lot of enthusiasm when something is 
developed, and patience is very hard for peo- 
ple. . . . They want to get the results. It’s a natural 
instinct.” But, she says, it’s important not to rush 
to summative judgments: “You just can’t take it 
out of the box and have phenomenal success.” 
Since the 2005 evaluation, the team has shared 
these experiences with other evaluators at several 
professional conferences. They do this, says Mac- 
ro International Senior Manager Michael Long, to 



give other evaluators “ammunition” when they 
are asked to evaluate the effectiveness of a new 
technology too soon after implementation. 

Use Multiple Measures to Gain a Balanced 
Perspective 

Teachers are not the only ones who need time 
to adapt to a new learning technology; students 
need time as well. Evaluators need to keep in 
mind that students’ inexperience or discomfort 
with a new online course or tool also can cloud 
evaluation efforts — especially if an evaluation is 
undertaken early in the program’s implementa- 
tion. The Algebra I Online program connects 
students to a certified algebra teacher via the In- 
ternet, while another teacher, who may or may 
not be certified, provides academic and technical 
support in the classroom. When comparing the 
experiences of Algebra I Online students and tra- 
ditional algebra students, EDC evaluators found 
mixed results. On the one hand, online students 
reported having less confidence in their alge- 
bra skills. Specifically, about two-thirds of stu- 
dents from the control group (those in traditional 
classes) reported feeling either confident or very 
confident in their algebra skills, compared to 
just under half of the Algebra I Online students. 
The online students also were less likely to re- 
port having a good learning experience in their 
algebra class. About one-fifth of Algebra I On- 
line students reported that they did not have a 
good learning experience in the class, compared 
to only 6 percent of students in regular algebra 
classes. On the other hand, the online students 
showed achievement gains that were just as high 
or higher than those of traditional students. Spe- 
cifically, the Algebra I Online students outscored 
students in control classrooms on 18 of 25 post- 
test items, and they also tended to do better on 



those items that required them to create an alge- 
braic expression from a real-world example. 

In an article they published in the Journal of 
Research on Technology in Education, the evalu- 
ators speculated about why the online students 
were less confident in their algebra skills and had 
lower opinions of their learning experience: “It 
may be that the model of delayed feedback and 
dispersed authority in the online course led to a 
lost’ feeling and prevented students from being 
able to gauge how they were doing .” 13 In other 
words, without immediate reassurance from the 
teacher of record, students may have felt they 
weren’t “getting it,” when, in fact, they were. 

This example suggests that students’ unfamiliar- 
ity with a new program can substantially affect 
their perceptions and experiences. The evalu- 
ators in this case were wise to use a variety of 
measures to understand what students were ex- 
periencing in the class. Taken alone, the stu- 
dents’ reports about their confidence and learn- 
ing experience could suggest that the Algebra I 
Online program is not effective. But when the 
evaluators paired the self-reported satisfaction 
data with test score data, they were able to see 
the contradiction and gain a richer understand- 
ing of students’ experiences in the program. 

Summary 

A lack of program maturity is not a reason to 
forego evaluation. On the contrary, evaluation 
can be extremely useful in the early phases of 
program development. Even before a program is 
designed, evaluators can conduct needs assess- 
ments to determine how the target population 
can best be served. In the program’s early im- 
plementation phase, evaluators can conduct 
formative evaluations that aim to identify areas 



for improvement. Then, once users have had 
time to adapt, and program developers have 
had time to incorporate what they’ve learned 
from early feedback and observations, evalua- 
tors can turn to summative evaluations to deter- 
mine effectiveness. 

When disseminating findings from summative 
evaluations, program leaders should work with 
their evaluator to help program stakeholders 
understand and interpret how program maturity 
may have affected evaluation findings. The use of 
multiple measures can help provide a balanced 
perspective on the program’s effectiveness. Pro- 
gram leaders also may want to consider repeat- 
ing a summative evaluation to provide more in- 
formation about the program as it matures. 

Given the sophistication of many online learn- 
ing programs today, it takes an extraordinary 
amount of time and money up front to create 
them. This, and stakeholders’ eagerness for find- 
ings, makes these programs especially vulner- 
able to premature judgments. Evaluators have 
an important message to communicate to stake- 
holders: Evaluation efforts at all stages of devel- 
opment are critical to making sure investments 
are well spent, but they need to be appropriate 
for the program’s level of maturity. 

Translating Evaluation Findings Into 
Action 

As the phases of data collection and analysis 
wind down, work of another sort begins. Evalu- 
ators present their findings and, frequently, their 
recommendations; then program leaders begin 
the task of responding to them. Several factors 
contribute to the ease and success of this process: 
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the strength of the findings, the clarity and speci- 
ficity of the recommendations, how they are dis- 
seminated, and to whom. The relationship be- 
tween the evaluators and the program leaders is 
key: When evaluators (external or internal) have 
ongoing opportunities to talk about and work 
with program staff on improvements, there is 
greater support for change. Conversely, if evalu- 
ators are fairly isolated from program leaders 
and leave the process once they have presented 
their recommendations, there is less support 
and, perhaps, a reduced sense of accountability 
among program staff. Of course, while frequent 
and open communication is important, main- 
taining objectivity and a respectful distance are 
also critical to obtaining valid research findings. 
Collaborations between researchers and practi- 
tioners should not be inappropriately close. 

When program leaders try to act on evaluation 
findings, the structure and overall health of the 
organization play a role as well. Some program 
leaders meet with substantial barriers at this 
stage, particularly if they are trying to change 
the behavior of colleagues in scattered program 
sites or other offices or departments. The prob- 
lem is compounded if there is a general lack of 
buy-in or familiarity with the evaluation. 

In such circumstances, how can program leaders 
and evaluators translate findings into program 
improvements? Among the programs featured 
in this guide, there are a variety of approaches 
to using evaluation findings to effect change. 
In the CPS/VHS, for example, program lead- 
ers have used evaluation findings to persuade 
reluctant colleagues to make needed changes 
and have repeatedly returned to the evaluation 
recommendations to guide and justify internal 
decisions. Meanwhile, AZVA program staff use 



a structured, formal process for turning evalu- 
ation recommendations into program improve- 
ments, including establishing timelines, staff as- 
signments, and regular status reports. The AZVA 
system, though time-consuming, has helped 
program administrators implement changes. 

Use Evaluation Findings to Inform and 
Encourage Change 

Chicago’s CPS/VHS is managed and implement- 
ed collaboratively by three CPS offices: the Of- 
fice of Technology Services eLearning, the Of- 
fice of High School Programs, and the Office 
of Research, Evaluation, and Accountability. In 
2005, Chief eLearning Officer Sharnell Jackson 
initiated an external evaluation to understand 
the cause of low course completion rates and 
find ways to help struggling students. 

Evaluator Tom Clark of TA Consulting found 
great variation in students’ ability to work inde- 
pendently, manage their time, and succeed with- 
out having an instructor physically present. The 
evaluation report recommended several ways to 
offer more support for struggling students, in- 
cluding having a dedicated class period in the 
school schedule for completing online course 
work and assigning on-site mentors to assist stu- 
dents during these periods. But when program 
administrators tried to implement these recom- 
mendations, they had difficulty compelling all 
participating schools to change. CPS is a large 
district with a distributed governance structure, 
making it difficult for the central office to force 
changes at the school-site level. 

Facing resistance from schools, the program’s 
administrators tried several different tacks to 
encourage implementation of the recommen- 
dations. First, they took every opportunity to 



communicate the evaluation findings to area 
administrators and principals of participating 
schools, making the case for change with cred- 
ible data from an external source. Some school 
leaders resisted, saying they simply did not have 
the manpower or the funds to assign on-site 
mentors. Still, they could not ignore the com- 
pelling data showing that students needed help 
with pacing, study skills, and troubleshooting the 
technology; without this help many were failing. 
The strength of these findings, along with finan- 
cial assistance from the district to provide mod- 
est stipends, convinced school site leaders to 
invest in mentors. Crystal Brown, senior analyst 
in CPS’s Office of Technology Services, reports 
that most CPS/VHS students now have access to 
an on-site mentor, “whereas before they just told 
a counselor, ‘I want to enroll in this class,’ and 
then they were on their own.” Brown says the 
program leaders say the evaluation also has been 
useful for prodding principals to provide profes- 
sional development for mentors and for persuad- 
ing mentors to participate. They use the evalua- 
tion findings “whenever we train a mentor,” she 
says, “and that’s how we get a lot of buy-in.” 

CPS/VHS administrators also are careful to set 
a good example by using the evaluation find- 
ings and recommendations to guide their own 
practices at the central office. To date, they have 
implemented several “high priority” recommen- 
dations from the report. For example, program 
leaders strengthened mentor preparation by in- 
stituting quarterly trainings for mentors and es- 
tablishing a shared online workspace that pro- 
vides guidelines and advice for mentors. The 
district also has implemented Advancement Via 
Individual Determination* programs in many 

* Advancement Via Individual Determination (AVID) is a program that prepares 4th 
through 12th grade students for four-year college eligibility. 



high schools to boost students’ study skills and 
support their achievement in the online pro- 
gram. As CPS/VHS expands, some of the ear- 
lier problems have been avoided by getting site 
administrators to agree up front to the practices 
recommended by the evaluation. Brown says, 
“We constantly reiterate what this study recom- 
mended whenever we have any type of orienta- 
tion [for] a new school that’s enrolling.” 

Finally, in some instances, the program adminis- 
trators changed program requirements outright 
and forced participating schools to comply. Be- 
ginning this year, for example, all online classes 
must have a regularly scheduled time during the 
school day. (There are a few exceptions made 
for very high-performing students.) This change 
ensures that students have dedicated computer 
time and mentor support to help them success- 
fully complete their course work on time. In ad- 
dition, participating students are now required 
to attend an orientation for the online courses 
where they receive training on study skills. 

Take a Structured Approach to Improvement 

Changing behavior and policy can be difficult in 
a large organization and, as the above example 
shows, program administrators must be creative 
and persistent to make it happen. Sometimes 
a small organization, such as an online school 
with a small central staff, has a distinct advan- 
tage when trying to implement evaluation rec- 
ommendations. With a nimble staff and a strong 
improvement process in place, AZVA, for exam- 
ple, has been especially effective in making pro- 
gram changes based on findings from its many 
evaluation efforts. 

Several factors explain AZVA’s success in 
translating evaluation findings into program 
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improvements. First, its evaluation process gen- 
erates detailed recommendations from both out- 
siders and insiders. That is, staff from their main 
content provider, K12 Inc., visit AZVA approxi- 
mately every other year and conduct a quality 
assurance audit to identify areas for program im- 
provement. Following the audit, K12 Inc. devel- 
ops a series of recommendations and, in turn, 
AZVA creates a detailed plan that shows what ac- 
tions will be taken to address each recommenda- 
tion, including who will be responsible and the 
target date for completion. For example, when 
the 2005 audit recommended that AZVA create 
a formal feedback loop for teachers, the school 
assigned a staff member to administer monthly 
electronic surveys to collect information from 
teachers about the effectiveness of their profes- 
sional development, their training and technol- 
ogy needs, their perceptions of parent training 
needs, and their suggestions for enhancing com- 
munity relations. AZVA has responded in similar 
fashion to many other recommendations gener- 
ated by K12 Inc.’s site visit, addressing a range 
of organizational, instructional, and operational 
issues (see fig. 4, Excerpts From AZVA’s Next 
Steps Plan, in Response to Recommendations 
From the K12 Quality Assurance Audit, p. 47). 

In addition to the audit, K12 Inc. also requires 
that AZVA complete an annual School Improve- 
ment Plan (SIP), which consists of two parts: a 
self-evaluation of school operations in general 
and a Student Achievement Improvement Plan 
(SAIP) that specifically focuses on student out- 
comes. In developing these plans, AZVA staff ar- 
ticulate a series of goals and specific objectives 
for improvement, again including strategies and 
timelines for meeting each objective. A strength 
of this process is its specificity. For example, one 
key SAIP goal was to improve student achieve- 



ment in math, and the administrative team set the 
specific goal of decreasing by 5 percent the num- 
ber of students who score “far below the stan- 
dards” on the state standards test and increasing 
by 5 percent the number who meet or exceed 
state standards. To accomplish this, AZVA staff 
took action on several fronts: they aligned their 
curriculum to the state’s testing blueprint, devel- 
oped a new curriculum sequencing plan, imple- 
mented additional teacher and parent training, 
worked with students to encourage test prepara- 
tion and participation, and developed individual 
math learning plans for students. 

Another strength of AZVA’s process is that it re- 
quires program staff to review evaluation recom- 
mendations regularly and continually track the 
progress that has been made toward them. The 
SIP and SAIP are evolving plans that are regularly 
updated and revised by “basically everyone that 
has any role in instruction,” such as the director 
of instruction, the high school director, and the 
special education manager, says AZVA’s director, 
Mary Gifford. As part of this process, team mem- 
bers continually return to the document and track 
how much progress has been made in reaching 
their goals. There also is external accountability 
for making progress on SIP and SAIP goals: ap- 
proximately once a quarter, AZVA staff review 
the plans via conference calls with K12 Inc. 

Finally, AZVA’s approach succeeds because it 
permeates the work of the entire school. As 
Bridget Schleifer, the K— 8 principal, explains, 
“Evaluation is built into everybody’s role and 
responsibility.” Staff members at all levels are 
expected to take evaluation recommendations 
seriously and to help to implement changes 
based on them. 



Figure 4. Excerpts From AZVA's Next Steps Plan, In Response to 
Recommendations From the K12 Quality Assurance Audit* 



K12 Inc. Site Visit Recommendations 


Actions 


Timeline 


Responsible 


Status 


Board and Organizational Structure 










AZVA should create additional mecha- 
nisms to centralize communications. 


The AZVA administration will survey school staff to de- 
termine communications needs. The administration will 
evaluate technology to assist with centralizing communi- 
cations. The administration will develop a communica- 
tions plan and begin implementation in fall 2005. 


12/31/2005 


Janice 

Gruneberg 


Will begin in 
summer 2005 


AZVA should create a formal feedback 
loop with teachers. 


The AZVA administration will develop a survey instrument 
to collect information from teachers regarding the effective- 
ness of professional development, outside training needs 
and effectiveness of outside training opportunities, technol- 
ogy needs, parent training needs, and community relations 
suggestions. The survey will be administered at monthly 
professional development meetings and available in an 
electronic format to capture contemporaneous feedback. 


8/1/2005 


Jacque 

Johnson-Hirt 


Will begin 
development 
in July 


Instruction 










AZVA should consider designing new 
teacher training and ongoing teacher 
training for each year in chunks and 
stages, rather than as one long week 
of training; this structure would allow 
practice and reflection. 


The director of instruction is developing the new teacher 
training agenda and sessions. Summer projects will be 
assigned in May; lead teachers will work with the director 
of instruction to develop training modules. 


8/1/2005 


Jacque 

Johnson-Hirt 


On track to 
meet timeline 


As AZVA continues to grow, it should 
give attention to scalability in all initia- 
tives from teacher mentoring and group 
Professional Development and training 
to student work samples. 


The director of instruction is revising the Parent Orientation 
Guide (includes work sample guidelines). Lead teachers 
and regular education teachers will have summer projects 
based on teacher training and mentoring needs. 


8/1/2005 


Jacque 

Johnson-Hirt 


On track to 
meet timeline 


AZVA should consider a way for admin- 
istration to review aggregate views of 
low-performing students as the school 
continues to grow (one suggestion could 
be having teachers submit the aggregate 
view, highlighted with red or yellow 
depending on level of attention needed). 


AZVA administration has requested a part-time adminis- 
trative position to manage and analyze student-level data. 
The administration has also requested a formal registrar 
position to capture data on incoming students. The assis- 
tant director for operations is revising the teacher tracking 
tool with this goal in mind. 


9/15/2005 


Janice 

Gruneberg, 

Jacque 

Johnson-Hirt 


On track to 
meet timeline 


Community Relations 










AZVA should continue to develop ways 
to encourage students and parents to 
attend outings and encourage increased 
participation in school activities. 


The director of instruction is working with teachers to de- 
velop lesson-based outings based on parent feedback. A 
master outing schedule and plans will be developed and 
included in the revised Parent Orientation Guide. 


7/30/2005 


Jacque 

Johnson-Hirt 


Administrative 
team is mak- 
ing summer 
project assign- 
ments in May; 
on track to 
meet timeline 


Special Education 










AZVA should monitor the special educa- 
tion students' state test scores from 
year to year in correlation with their IEP 
goals to ensure proper growth as well 
as comparable growth to same age 
non-disabled peers. 


The administration is requesting a part-time staff position 
to manage and analyze student-level data. The SPED 
team is currently developing a process to measure prog- 
ress on IEP goals, including test score gains. State test 
scores will be available in June. 


7/15/2005 


Lisa Walker 


SPED team 
developing 
process; on 
track to meet 
timeline 



* The U.S. Department of Education does not mandate or prescribe particular curricula or lesson plans. The information in this figure was provided by the 
identified site or program and is included here as an illustration of only one of many resources that educators may find helpful and use at their option. The 
Department cannot ensure its accuracy. Furthermore, the inclusion of information in this figure does not reflect the relevance, timeliness, or completeness of this 
information; nor is it intended to endorse any views, approaches, products, or services mentioned in the figure. 
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Summary 

As the above examples show, whether and how 
evaluation findings lead to program improve- 
ments is a function not only of the quality of the 
evaluation, but of many other contextual and or- 
ganizational factors. Program leaders can facili- 
tate program change by working from the begin- 
ning to create an ongoing relationship between 
evaluators and program staff. Throughout the 
process, there should be opportunities for staff 
members to discuss the evaluation, its findings, 
and its implications for improving the program. 
Off-site staff and partners should be included as 
well. In short, program leaders should commu- 
nicate early and often about the evaluation with 
anyone whose behavior might be expected to 
change as a result of its findings. 

Once findings and recommendations are avail- 
able, program leaders might want to consider 
using a structured, formal process for turning 
those recommendations into program improve- 
ments. One approach is to decide on a course 
of action, set a timeline, and identify who will 
be responsible for implementation. Whether us- 
ing a formal process or not, program leaders 
should revisit recommendations regularly and 
continually track the progress that has been 
made toward them. 



In some instances, recommended changes to an 
online program or resource may be technically 
hard to implement. For example, it may be diffi- 
cult and expensive to make changes to the content 
or format of an online course. It also may be quite 
costly to change, repair, or provide the hardware 
needed to improve an online program. Insuffi- 
cient funding may cause other difficulties as well. 
If the program has relied on external funding that 
has subsequently am out, there may be pressure 
to dilute the program’s approach; for example, 
districts might feel pressured to keep an online 
course but eliminate the face-to-face mentors that 
support students as they proceed through it. 

Online program evaluators need to keep these 
kinds of practical challenges in mind when for- 
mulating recommendations and should consider 
ranking their suggestions both in order of im- 
portance and feasibility. Program leaders should 
develop a plan for addressing the highest prior- 
ity recommendations first. In situations where 
program funds are running low, evaluators can 
provide a much-needed external perspective, 
reminding stakeholders of the project’s goals 
and helping them identify the most critical pro- 
gram elements to keep, even if it means serving 
fewer participants. More broadly, communica- 
tion and persistence are essential when attempt- 
ing to translate evaluation findings into action. 
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PART II 




ions 

or Evaluating 
Online Learnina 



Part I draws on seven featured evaluations to 
illustrate the challenges of evaluating online 
learning and to describe how different evalua- 
tors have addressed them. Part II offers recom- 
mendations that synthesize the lessons learned 
in these examples and also draw from research 
and conversations with experts in evaluating 
online learning. These recommendations aim to 
provide guidance to program leaders who are 
contemplating an evaluation and to assist pro- 
gram leaders and evaluators who are already 
working together to complete one. 

General Recommendations 

Several recurring themes can be found in the 
seven featured evaluations, as well as in the re- 
search and comments from experts in the held 
of online learning. These sources point to a 
handful of overarching recommendations: 

• Begin with a clear vision for the evaluation. 
Determine what you want the evaluation to 
accomplish and what questions you hope to 
answer. Program leaders and evaluators may 
want to consider the questions listed in the 
box on page 51 as they get started. 

• Determine the most appropriate evaluation 
methods for meeting your goals. Consider the 
different types of evaluations discussed in the 
guide (see Part I for examples of formative 



and summative; internal and external; experi- 
mental, quasi-experimental, and other types 
of evaluations) and the costs and benefits of 
each. What type of evaluation is appropri- 
ate given your stated purpose? What research 
methods will best capture program effects? 

• Budget to meet evaluation needs. Limited 
budgets are a common barrier to evalua- 
tors. When designing evaluations, consider 
whether there are available funds to cover 
all planned data collection and analysis ac- 
tivities, plus the costs of any needed back- 
ground research, internal communications 
and reporting, and incentives for study par- 
ticipants. If available funds are not sufficient, 
scale back the evaluation and focus on the 
highest-priority activities. 

• Develop a program culture that supports eval- 
uation. Discuss evaluation with staff mem- 
bers, clearly explaining its value and their 
roles in collecting and analyzing data. Incor- 
porate data collection and analysis activities 
into staff members’ everyday responsibilities 
instead of treating these tasks as one-time ef- 
forts or burdens. Make external evaluators 
less threatening to program staff members by 
creating opportunities for dialogue between 
the evaluator and program leaders and staff. 
Incorporate evaluation data into the process 
of development of annual program goals and 
strategic planning activities. 

• Communicate early and often with anyone 
who will be affected by the evaluation. 
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• Dedicate adequate time and money to com- 
municating with internal and external stake- 
holders at all phases of the evaluation. How- 
ever, be sensitive to program concerns about 
keeping evaluation results confidential and 
avoiding media scrutiny. Always keep those 
managing the program “in the loop.” Com- 
municating through the program director is 
often the best course of action. 

Recommendations for Meeting the 
Needs of Multiple Stakeholders 

• Identify up front the various stakeholders who 
will be interested in the evaluation and what 
specifically they will want to know. Consider 
conducting interviews or focus groups to col- 
lect this information. 

• Use this information to determine the evalua- 
tion’s main purpose(s) and to develop ques- 
tions that will guide the study. An evaluation 
that clearly addresses stakeholder needs and 
interests is more likely to yield findings and 
recommendations they will find worthy and 
champion. 

• If trying to fulfill both program improvement 
and accountability purposes, consider us- 
ing evaluation approaches that can generate 
both formative and summative information 
(see Glossary of Common Evaluation Terms, 
p. 65 ). Be realistic at this stage and keep in 
mind the constraints of the evaluation budget 
and timeline. 

• Think early on about how the evaluation will 
incorporate student learning outcomes for 
accountability purposes. Consider a range of 
academic outcomes that might be appropriate 
to study, including scores on state-mandated 
and other standardized tests, course comple- 
tions, grades, and on-time graduation. 

• If the program has multiple components, de- 
termine the ones with the best potential to 
have a direct measurable impact on student 



achievement, and focus your study of student 
outcomes there. 

• Consider using “dashboard indicators,” the 
two or three most critical goals to be mea- 
sured. How can the measures be succinctly 
communicated? Can they be graphed over 
time to demonstrate improved performance 
over time? (For example, an online program 
trying to increase access to Advanced Place- 
ment (AP)* courses might have a dashboard 
indicator composed of the number of schools 
accessing online AP courses; the AP exam 
pass rate, and the number of students taking 
AP exams as a percentage of total number of 
AP students in online AP courses.) 

• In the reporting phase, think about what 
findings will be of most interest to different 
stakeholders. Consider communicating find- 
ings to different audiences in ways that are 
tailored to their needs and interests. 

• If already participating in mandatory evalu- 
ation activities, think about how those find- 
ings can be used for other purposes, such as 
making internal improvements. Disseminate 
any mandatory evaluation reports to the staff 
and discuss how their findings can be used to 
strengthen the program. Consider developing 
additional data collection efforts to supple- 
ment the mandatory evaluation. 

Recommendations for Utilizing and 
Building on the Existing Base of 
Knowledge 

• Look to established standards for online 
learning to determine elements of high- 
quality programs. 

• Program evaluation has a long history, and 
its basic approaches can be adapted to fit 

• Run by the nonprofit College Board, the Advanced Placement program offers col- 
lege-level course work to high school students. Many institutions of higher education 

offer college credits to studenls who take AP courses. 
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Important Questions for K-12 Online Learning Evaluators and 
Practitioners 

In a synthesis of research on K-12 online learning, Rosina Smith of the Alberta Online Consortium, Tom Clark of TA 
Consulting, and Robert Blomeyer of Blomeyer & Clemente Consulting Services developed a list of important ques- 
tions for researchers of K-12 online learning to consider. 14 That list, which drew from the work of Cathy Cavanaugh 
et al., ! ‘ has been adapted here for program and evaluation practitioners. Of course, no evaluation can cover all of 
these questions; this list is meant to provide a starting point for discussing evaluation goals and topics of interest. 

Learner outcomes. What is the impact of the K-l 2 online learning program on student achievement? What factors 
can increase online course success rates? What impact does the program have on learner process skills, such as 
critical and higher-order thinking? How are learner satisfaction and motivation related to outcomes? 

Learner characteristics. What are the characteristics of successful learners in fhis K-l 2 online learning program, 
and can success be predicted? How do learner background, preparation, and screening influence academic out- 
comes in the program? 

Online learning features. What are the most effective combinations of media and methods in the online learning 
program? How do interaction, collaboration, and learner pacing influence academic outcomes? What is the im- 
pact of the K-l 2 online learning program when used as a supplement, in courses, or in full programs of study? 

Online teaching and professional development. What are the characteristics of successful teachers in this K-l 2 
online learning program? Are the training, mentoring, and support systems for these teachers effective? 

Education context. What kinds of programs and curricula does the K-l 2 online learning program offer? How can the 
program best be used to improve learner outcomes in different content areas, grade levels, and academic programs? 
How can it help participating schools meet NCLB requirements? How do resources, policy, and funding impact the suc- 
cess of the program? Can an effective model of K-l 2 online learning be scaled up and sustained by the program? 



any unique educational program. The Pro- 
gram Evaluation Standards (1994) from the 
Joint Committee on Standards for Educational 
Evaluation and other guides (see Publications 
and Tools for Evaluation in appendix A) can 
be helpful to those developing an online 
learning evaluation. 

• Don’t reinvent the wheel unless you have 
to — consult the list of distance education 
and evaluation resources in appendix A and 
identify program peers you can contact about 
their evaluation activities. 

• Participate in the community of evaluators 
and researchers studying K-12 online learn- 
ing. Share evaluation tools and processes with 



others. Make them available online. Consider 
publishing in professional journals. Seek out 
networking venues such as conferences. 

• Use caution when interpreting evaluation 
findings from other programs, or adapting 
their methods. Online learning programs 
“come in many shapes and sizes,” and find- 
ings about one online program or set of Web 
resources are not always generalizable. Take 
time to understand the program being studied 
and its context before deciding if the findings 
or methods are relevant and appropriate. 

• If developing new tools for collecting data 
or new processes for analyzing it, work 
collaboratively with leaders of similar 
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programs or experts from other agencies to 
fill gaps in knowledge. 

Recommendations for Evaluating 
Multifaceted Resources 

• Multifaceted programs include a range of 
program activities and may struggle to de- 
fine a clear, central purpose. Evaluators can 
help such programs define and refine their 
purposes and goals. Establishing key perfor- 
mance measures aligned to funder-mandated 
and project goals can help program managers 
prioritize and help evaluators identify what is 
most important to study. 

• When evaluating a multifaceted educational 
program, consider an evaluation strategy that 
combines both breadth and depth. For ex- 
ample, collect broad information about usage 
and select a particular program feature or re- 
source to examine more deeply. 

• Where feasible, consider a multiyear evalua- 
tion plan that narrows its focus in each suc- 
cessive year, or examines a different resource 
each year. 

• To select a particular program feature or re- 
source for deeper examination, think about 
what each resource is intended to do, or what 
outcomes one would hope to see if the re- 
source was being used effectively. Work with 
an evaluator to determine which resource is 
best suited for a more thorough evaluation. 

Recommendations for Programs 
Considering a Comparison Study 

• Seek to determine if a comparison study is ap- 
propriate, and, if it is, whether there is a via- 
ble way to compare program participants with 
others. Consider the online program’s goals, 
the student population being served, and the 
program’s structure. For example, online pro- 
grams dedicated to credit recoveiy would not 



want to compare their student outcomes with 
those of the general student population. 

• Clearly articulate the purpose of the compari- 
son. Is the evaluation seeking to find out if 
the online program is just as effective as the 
traditional one or more effective? For exam- 
ple, “just as effective” findings are desirable 
and appropriate when online programs are 
being used to expand education access. 

• If considering a quasi-experimental design 
(see Glossary of Common Evaluation Terms, 
p. 65), plan carefully for what classes will be 
used as control groups, and thoroughly as- 
sess the ways they are similar to and differ- 
ent from the treatment classes. What kinds 
of students does each class serve? Did the 
students (or teachers) choose to participate 
in the class or were they assigned to it ran- 
domly? Are students taking the treatment and 
control classes for similar reasons (e.g., credit 
recovery, advanced learning)? Are the stu- 
dents at a similar achievement level? Where 
feasible, use individual student record data to 
match treatment and comparison students or 
to hold constant differences in prior learning 
characteristics across the two groups. 

• If considering a randomized controlled trial 
(see Glossary of Common Evaluation Terms, 
p. 65), determine how feasible it is for the 
particular program. Is it possible to assign 
students randomly either to receive the treat- 
ment or be in the control group? Other practi- 
cal considerations: Can the control group stu- 
dents receive the treatment at a future date? 
What other incentives can be offered to en- 
courage them to participate? Will control and 
treatment students be in the same classroom 
or school? If so, might this cause “contamina- 
tion” between treatment and control groups? 

• In the event a randomized controlled trial or 
quasi-experimental study is planned, plan to 
offer meaningful incentives to participating 
individuals and schools. When deciding on 



appropriate incentives, consider the total 
time commitment that will be asked of study 
participants. 

• Study sites with no vested financial interest 
are more likely to withdraw or to fail to carry 
through with study requirements that differ 
from typical school practices. If compliance 
appears unlikely, do not attempt an experi- 
mental or quasi-experimental study, unless 
it is an explicit requirement of a mandated 
evaluation. 

• When designing data management systems, 
keep in mind the possibility of comparisons 
with traditional settings. Collect and organize 
data in a way that makes such comparisons 
possible. 

Recommendations for Gathering Valid 
Evaluation Data 

• Build in adequate time to fully communicate 
the purpose and design of the evaluation to 
everyone involved. Inform program staff who 
will play a role in the evaluation, as well as 
anyone who will help you gather evaluation 
evidence. Explain how study participants will 
benefit from the evaluation. 

• Be prepared to repurpose the methods you 
use to conduct an evaluation, without los- 
ing sight of the evaluation’s original purpose. 
Collecting multiple sources of evidence relat- 
ed to the same evaluation question can help 
to ensure that the evaluator can answer the 
question if a data source becomes unavail- 
able or a research method proves infeasible. 

• Seek out valid and reliable instruments for 
gathering data. Existing data-gathering instru- 
ments that have been tested and refined can 
offer higher-quality data than locally devel- 
oped instruments. 

• Consider administering surveys online, to boost 
response rates and easily compile results. 



• Consider if there are innovative ways to col- 
lect data electronically about how online 
resources are used (e.g., tracking different 
pathways users take as they navigate through 
a particular online tool or Web site, or how 
much time is spent on different portions of a 
Web site or online course). 

• If response rates are too low, consider re- 
designing or refocusing the evaluation. Find 
other indicators that get at the same phe- 
nomenon. Find other data sources that put 
more control into the hands of evaluators 
(e.g., observations, focus groups). 

• If collecting and aggregating data across mul- 
tiple sources, define indicators clearly and 
make sure that data are collected in com- 
patible formats. Define exactly what is to 
be measured, and how, and distribute these 
instructions to all parties who are collecting 
data. 

• Research relevant data privacy regulations 
well in advance. 

• Determine the process and build in ade- 
quate time for requesting student data from 
states, districts, or schools. Determine early 
on whether data permissions will be needed 
and from whom, and how likely it is that 
sensitive data will actually be available. Have 
a “Plan B” if it is not. 

• If the program does not have good access 
to the complete school record of enrolled 
students, encourage it to support school 
improvement and program evaluator needs 
by collecting NCLB subgroup and other key 
student record data as part of its regular pro- 
gram registration process. 

• Consider creative and flexible arrangements 
for protecting student privacy (e.g., sign con- 
fidentiality agreements, physically travel to a 
specific site to analyze data, employ a disin- 
terested third party to receive data sets and 
strip out any identifying information). 
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• Incorporate precautions into the study pro- 
tocol to protect students’ privacy. When pos- 
sible, avoid contact with students’ names and 
refer to students through student identifica- 
tion numbers. 

Recommendations for Taking Program 
Maturity Into Account 

• All education programs go through devel- 
opmental stages. Different kinds of evalua- 
tion are appropriate at each stage . 16 Based on 
when they started and other factors, different 
components of a program may be in different 
development stages during an evaluation. 

• In the early stages of implementation, focus 
on formative evaluation efforts. Hold off on 
summative evaluations until users have had 
time to adapt to it, and there has been ad- 
equate time to revise the program design 
based on early feedback and observations. 

• Once a stable program model has been 
achieved, study of education outcomes is ap- 
propriate. Finally, as the program or program 
component matures, its long-term sustainabil- 
ity and replicability should be considered. 

• Program leaders and evaluators should work 
together to help program stakeholders un- 
derstand and interpret how program maturity 
can affect evaluation findings. 

• Program leaders and evaluators should work 
together to establish baselines and bench- 
marks, so that progress over time toward 
program goals can be measured. 

Recommendations for Translating 
Evaluation Findings Into Action 

• Online learning programs often exist within 
larger learning organizations and policy and 
practice frameworks that encourage or inhibit 
program success. Use evaluation results to en- 
courage needed changes in program context. 



• Key stakeholders, such as parents and stu- 
dents, teachers, schools, administrators, 
policymakers, and the general public, can 
be informed through the communication of 
evaluation results. Communication of evalu- 
ation results can help online learning pro- 
grams demonstrate their value or worth, 
demonstrate accountability, and dispel myths 
about the nature of online learning. 

• If the resulting recommendations are action- 
able, online learning programs should move 
quickly to implement them. Don’t just file the 
evaluation away; make it a living document 
that informs the program. 

• Work to create an ongoing relationship among 
evaluators, program leaders, and staff. Cre- 
ate ongoing opportunities to talk about the 
evaluation, its findings, and its implications 
for improving the program. 

• Engage early with anyone whose behavior 
will be expected to change as a result of the 
evaluation findings — including personnel at 
distant sites who help manage the program. 
Introduce them to evaluators and communi- 
cate the purpose of the evaluation. 

• Institutionalize process improvement and 
performance measurement practices put in 
place during the evaluation. Consider us- 
ing a structured, formal process for turning 
evaluation recommendations into program 
improvements, including timelines and staff 
assignments. 

• Review evaluation recommendations regu- 
larly and continually track the progress that 
has been made toward them. 

• Rank evaluation recommendations both in 
order of importance and feasibility. Develop 
a plan for addressing the highest priority rec- 
ommendations first. 

• Continue to conduct evaluation activities in- 
ternally or externally to continuously improve 
your program over time. 
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Alabama Connecting Classrooms, 
Educators, & Students Statewide 
Distance Learning 
Alabama 

The Alabama Connecting Classrooms, Educa- 
tors, & Students Statewide (ACCESS) Distance 
Learning Initiative aims to provide all Alabama 
high school students “equal access to high qual- 
ity instruction to improve student achievement.” 
ACCESS was developed to support and expand 
existing distance learning initiatives in Alabama 
and to heighten their impact on student achieve- 
ment. In particular, the aim was to provide more 
courses to students in small and rural schools. 
ACCESS courses are Web-based, utilize interac- 
tive videoconferencing (I VC) platforms, or com- 
bine both technologies. ACCESS offers courses 
in core subjects, plus foreign languages, elec- 
tives, remedial courses, and advanced courses, 
including Advanced Placement (AP)* and dual 
credit courses. The courses are developed and 
delivered by Alabama-certified teachers. In ad- 

* Run by the nonprofit College Board, the Advanced Placement program offers 
college-level couree work to high school students. Many institutions of higher educa- 
tion offer college credits to students who take AP courses. 



dition to distance learning courses for students, 
ACCESS also provides teachers with professional 
development and multimedia tools to enhance 
instruction. ACCESS is coordinated by the Ala- 
bama State Department of Education and three 
regional support centers. By the end of 2006, AC- 
CESS was serving almost 4,000 online users and 
over 1,000 IVC users and, ultimately, is intended 
to reach all public high schools in the state. 

Algebra I Online 

Louisiana 

Algebra I Online began as a program under the 
Louisiana Virtual School, an initiative of the Lou- 
isiana Department of Education that provides 
the state’s high school students with access to 
standards-based high school courses delivered 
by certified Louisiana teachers via the Internet. 
The program has two goals: To increase the 
number of students taught by highly qualified 
algebra teachers, especially in rural and urban 
areas, and to help uncertified algebra teachers 
improve their skills and become certified. In 
Algebra I Online courses, students participate 
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in a face-to-face class that meets at their home 
school as part of the regular school day, and 
each student has a computer connected to the 
Internet. A teacher who may or may not be 
certified to deliver algebra instruction is physi- 
cally present in the classroom to facilitate stu- 
dent learning, while a highly qualified, certified 
teacher delivers the algebra instruction online. 
The online instructor’s responsibilities include 
grading assignments and tests, and submitting 
course grades; the in-class teacher oversees 
the classroom and works to create an effective 
learning environment. The online and in-class 
teachers communicate regularly during the year 
to discuss student progress. The algebra course 
is standards-aligned and incorporates e-mail, in- 
teractive online components, and video into the 
lessons. By the 2005-06 school year, 350 stu- 
dents in 20 schools were taking Algebra I On- 
line courses, and five participating teachers had 
earned secondary mathematics certification. 

Appleton eSchool 

Wisconsin 

Based in Wisconsin’s Appleton Area School 
District, Appleton eSchool is an online char- 
ter high school intended to provide high-qual- 
ity, self-paced online courses. A few students 
choose to take their entire high school course 
load through Appleton eSchool, but most use 
the online courses to supplement those avail- 
able at their home school. The school is open 
to all district high school students, but it makes 
special efforts to include those with significant 
life challenges. Appleton eSchool offers core 
subjects, electives (e.g., art, economics, Web 
design, “thinking and learning strategies”), and 
AP courses. Students can gain access to their 



Web-based courses around the clock and sub- 
mit their assignments via the Internet. They can 
communicate with teachers using e-mail and 
online discussions, and receive oral assessments 
and tutoring by telephone and Web conference 
tools. Teachers regularly communicate student 
progress with a contact at the student’s local 
school and with each student’s mentor (usually 
the student’s parent), whose role is to provide 
the student with assistance and encouragement. 
By 2007-08, the school was serving 275 students 
enrolled in over 500 semester courses. In addi- 
tion, the 2007 summer session included another 
400 semester course enrollments. 

Arizona Virtual Academy 

Arizona 

Arizona Virtual Academy (AZVA) is a pub- 
lic charter school serving students in grades 
kindergarten through 11 across Arizona. By 
2006-07, AZVA was serving approximately 
2,800 students up through the 10th grade and 
piloting a small llth-grade program. AZVA of- 
fers a complete selection of core, elective, and 
AP courses, including language arts, math, sci- 
ence, history, art, music, and physical educa- 
tion. Courses are developed by veteran public 
and private school teachers and supplied to 
AZVA by a national curriculum provider, K12 
Inc. In 2006—07, AZVA used Title I funds 17 to 
add a supplemental math program for strug- 
gling students in grades 3 through 8. In all 
AZVA courses, certified teachers provide in- 
struction and keep track of students' progress. 
Students participate in structured activities, and 
also study independently under the guidance 
of an assigned mentor. When families enroll 
with AZVA, the program sends them curricular 



materials, accompanying textbooks and work- 
books, supplemental equipment and supplies, 
and a computer and printer, the latter two on 
loan. Students are assessed regularly, both for 
placement in the appropriate course level and 
to determine their masteiy of course content. 

Chicago Public Schools' Virtual High 
School 
Chicago 

The Chicago Public Schools’ Virtual High School 
(CPS/VHS) is intended to expand access to high- 
quality teachers and courses, especially for stu- 
dents who traditionally have been underserved. 
The program is a partnership with the Illinois 
Virtual High School (IVHS) — a well-established 
distance learning program serving the entire 
state. CPS/VHS offers a variety of online courses. 
To participate, students must meet prerequisites 
and are advised of the commitment they will 
need to make to succeed in the class. CPS/VHS 
leaders are clear that, in this program, online 
does not mean independent study: Courses run 
for a semester, students are typically scheduled 
to work online during the regular school day, 
and attendance during that time is required. On- 
site mentors are available to help students as 
they progress through the online classes. De- 
spite this regular schedule, the program still of- 
fers flexibility and convenience not available in 
a traditional system because students can com- 
municate with instructors and access course 
materials outside of regular class hours. Today, 
CPS/VHS offers over 100 online courses, and 
new classes are added when the district deter- 
mines that the course will meet the needs of 
approximately 60 to 75 students. 



Digital Learning Commons 

Washington 

Based in the state of Washington, the Digital 
Learning Commons (DLC) is a centrally hosted 
Web portal that offers a wide range of services 
and resources to students and teachers. DLC’s 
core goal is to offer education opportunities and 
choices where they have not previously existed 
due to geographic or socioeconomic barriers. 
Through DLC, middle and high school students 
can access over 300 online courses, including 
all core subjects and various electives, plus Ad- 
vanced Placement classes and English as a Sec- 
ond Language. DLC students also have access to 
online student mentors (made possible through 
university partnerships), and college and ca- 
reer-planning resources. DLC hosts an extensive 
digital library, categorized by subject area, for 
students, teachers, and parents. In addition, it 
includes resources and tools for teachers, in- 
cluding online curricula, activities, and diagnos- 
tics. It is integrated with Washington’s existing 
K-20 Network — a high-speed telecommunica- 
tions infrastructure that allows Washington’s 
K-12 schools to use the Internet and interac- 
tive videoconferencing. When schools join DLC, 
program staff help them create a plan for using 
the portal to meet their needs and, also, provide 
training to help school faculty and librarians in- 
corporate its resources into the classroom. 

Thinkport 

Maryland 

Maryland Public Television has partnered with 
Johns Hopkins University Center for Technolo- 
gy in Education to create Thinkport, a Web site 
that functions as a one-stop shop for teacher 



Hz 



Evaluating Online Learning: Challenges and Strategies for Success 
INNOVATIONS IN EDUCATION 



and student resources. Thinkport brings to- 
gether quality educational resources and tools 
from trusted sources, including the Library of 
Congress, the U.S. Department of Education, 
PBS, the Kennedy Center, the National Coun- 
cil of Teachers of English, National Geographic, 
and the Smithsonian Institution. The site is or- 
ganized into four sections: classroom resources, 
career resources for teachers, instructional tech- 
nology, and family and community resources. 
About 75 percent of the site’s content is for 
teachers, including lesson plans and activities 
for students, all of which include a technolo- 
gy component. Thinkport also offers podcasts, 



video clips, blogs, and games, with accompa- 
nying information about how they can be used 
effectively in classrooms. One of Thinkport’s 
most popular features is its collection of elec- 
tronic field trips — a number of which were de- 
veloped by Maryland Public Television under 
the U.S. Department of Education’s Star Schools 
grant, a federal program that supports distance 
learning projects for teachers and students in 
underserved populations. Each field trip is es- 
sentially an online curricular unit that focuses in 
depth on a particular topic and includes many 
interactive components for students and accom- 
panying support materials for teachers. 
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Resources 



The resources listed below are intended to pro- 
vider readers with ready access to further infor- 
mation about evaluating online learning. This 
is not a complete list, and there may be other 
useful resources on the topic. Selection was 
based on the criteria that resources be relevant 
to the topic and themes of this guide; current 
and up-to-date; from nationally recognized or- 
ganizations, including but not limited to federal 
or federally-funded sources; and that they of- 
fer materials free of charge. This listing offers a 
range of research, practical tools, policy infor- 
mation, and other resources. 

Distance Learning 

International Society ofTechnology in Education 
The International Society of Technology in Edu- 
cation (ISTE) is a nonprofit organization with 
more than 85,000 members. ISTE provides ser- 
vices, including evaluation, to improve teach- 
ing, learning, and school leadership through 
the use of technology. In 2007, ISTE published 
a comprehensive overview of effective online 
teaching and learning practices, What Works 
in a K-12 Online Learning. Chapter topics in- 
clude virtual course development, online learn- 
ing in elementary classrooms, differentiating in- 
struction online, professional development for 
teachers of virtual courses, and the challenges 
that virtual schools will face in the future, 
http ://www. iste.org 

The North American Council for Online 
Learning 

The North American Council for Online Learn- 
ing (NACOL) is an international K-12 non- 
profit organization focused on enhancing K-12 
online learning quality. NACOL publications, 
programs, and research areas include the Na- 
tional Standards of Quality for Online Courses, 



National Standards for Quality Online Teach- 
ing, state needs assessments for online courses 
and services, online course quality evaluations, 
online professional development, virtual educa- 
tion program administration, funding, and state 
and federal public policy. In addition, NACOL 
has published a primer on K-12 online learn- 
ing, which includes information about teach- 
ing, learning, and curriculum in online environ- 
ments, and evaluating online learning. 
http://www.nacol.org 

North Central Regional Educational Laboratory/ 
Learning Point Associates 

The North Central Regional Educational Labora- 
tory (NCREL) was a federally funded education 
laboratory until 2005. Learning Point Associates 
conducted the work of NCREL and now operates 
a regional educational laboratory (REL) with a 
new scope of work and a new name: REL Mid- 
west. NCREL publications about online learning 
are currently available on the Learning Point As- 
sociates Web site, including A Synthesis of New 
Research on K-12 Online Learning (2005), which 
summarizes a series of research projects spon- 
sored by NCREL and includes recommendations 
for online research, policy, and practice, 
http : //www. ncrel . org/tech/e learn . htm 

Southern Regional Education Board 
The Southern Regional Education Board (SREB) 
is a nonprofit organization that provides educa- 
tion reform resources to its 16 member states. 
The SREB Educational Technology Cooperative 
focuses on ways to help state leaders create 
and expand the use of technology in educa- 
tion. SREB’s Web site provides the Standards for 
Quality Online Courses and Standards for Qual- 
ity Online Teaching. In 2005, SREB partnered 
with the AT&T Foundation to create the State 
Virtual Schools Alliance, to assist SREB’s 
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16 member states to increase middle- and high- 
school students’ access to rigorous academic 
courses through state-supported virtual schools. 
Through AT&T Foundation grant funding, the 
alliance facilitates collaboration and information 
and resource sharing between states in order to 
create and improve state virtual schools, 
http ://www. sreb . org 

Star Schools Program 

The Star Schools Program, housed in the U.S. 
Department of Education’s Office of Innova- 
tion and Improvement, supports distance edu- 
cation programs that encourage improved in- 
struction across subjects and bring technology 
to underserved populations. Information about 
the program and its evaluation is available on 
the Department of Education’s Web site. The 
Star Schools Program is funding this guide, 
http ://www. ed.gov/programs/starschools/ 
index.html 

Evaluation Methods and Tools 

Institute of Education Sciences 
In 2002, the Education Sciences Reform Act cre- 
ated the Institute of Education Sciences (IES) 
as an institute dedicated to conducting and 
providing research, evaluation, and statistics in 
the education field. IES maintains a registry of 
evaluators and has published the two guides 
described below. The Regional Educational 
Laboratory Program is also housed within IES. 
http ://ies . ed. gov 

Identifying and Implementing Educational Prac- 
tices Supported by Rigorous Evidence:A User 
Friendly Guide 

Published by the U.S. Department of Educa- 
tion, this guide is meant to help practitioners 
understand the meaning of “rigorous evidence” 
and important research and evaluation terms, 
such as “randomized controlled trials.” Under- 
standing these terms and the different types of 
evidence in the education field can help when 



designing an internal evaluation or working 
with an external agency. 
http://www.ed.gov/rschstat/research/pubs/ 
rigorousevid/index . html 

Random Assignment in Program Evaluation and 
Intervention Research: Questions and Answers 
This document, published by the Institute of 
Education Sciences in the U.S. Department of 
Education, discusses the purpose of education 
program evaluation generally, and answers spe- 
cific questions about studies that use random as- 
signment to determine program effectiveness. 
http://ies.ed.gov/ncee/pubs/randomqa.asp 

Regional Educational Laboratory Program 
The Regional Educational Laboratory (REL) Pro- 
gram is a network of 10 applied research lab- 
oratories that serve the education needs of the 
states within a designated region by providing 
access to scientifically valid research, studies, and 
other related technical assistance services. The 
labs employ researchers experienced in scientific 
evaluations who can provide technical assistance 
on evaluation design. The REL Web site provides 
contact information for each of the labs. 
http://ies.ed.gov/ncee/edlabs/regions 

Local School System Planning, Implementation, 
and Evaluation Guide 

This guide from Maryland Virtual Learning Op- 
portunities is a useful resource when consider- 
ing program implementation. Planning consid- 
erations are divided into a three-part checklist 
of planning, implementation, and evaluation. 
Additionally, suggested roles and responsibili- 
ties are provided for both district- and school- 
based personnel. The guide can be found on- 
line at the following URL, using the link on the 
left-hand side for the “Planning, Implementa- 
tion, and Evaluation Guide.” 
http : //mdk 1 2 online . org/doc s/PIEGuide .pdf 

Online Program Perceiver Instrument 

The Online Program Perceiver Instrument 

(OPPI) was designed by the staff at Appleton 



eSchool in Wisconsin, as an online evaluation 
tool to be used by Wisconsin’s network of vir- 
tual schools. The tool is featured as an internal 
evaluation process in this guide and is currently 
available for all online learning programs across 
the country to use. The Web site provides infor- 
mation about the framework and an overview 
of how the instrument works, as well as contact 
information and directions on how to become a 
member of the OPPI network. 
http://www.wisconsineschool.net/OPPI/OPPI. 
asp 

Program Evaluation Standards 
This document is a 1994 publication of the 
Joint Committee on Standards for Educational 
Evaluation, a coalition of professional associa- 
tions concerned with the quality of evaluation. 
The standards address utility, feasibility, pro- 
prietary, and accuracy issues, and are intended 
for use in checking the design and operation of 
an evaluation. 

http : //www. wmich . edu/e valctr/j c 

2005 SETDA National Leadership Institute Tool- 
kit on Virtual Learning 

The State Educational Technology Directors As- 
sociation (SETDA) was developed to provide 
national leadership and facilitate collaboration 
between states on education technology issues. 
Each year, SETDA hosts a National Leadership 
Institute and develops toolkits intended to help 
educators effectively use virtual learning. The 
2004—05 toolkit includes a lengthy section on 
program evaluation. 

http ://www. setda. org/toolkit/toolkit2004/ 
index.htm 

Resources From Higher Education 

Council for Higher Education Accreditation 
The Council for Higher Education Accreditation 
(CHEA) has a number of publications identi- 
fying standards and best practices in distance 
learning. These include: Accreditation and 



Assuring Quality in Distance Learning and Best 
Practices for Electronically Offered Degree and 
Certificate Programs , available at the following 
Web sites, respectively: 

http ://www. chea.org/Research/ 

Accred-Distance-5-9-02.pdf 

http ://www.ncahlc . org/download/Best_Pract_ 

DEd.pdf 

Quality Matters 

Quality Matters is a multi-partner project funded 
in part by the U.S. Department of Education’s 
Fund for the Improvement of Postsecondary 
Education (FIPSE). Quality Matters has created 
a rubric and process for certifying the quality of 
online courses. 

http : //www. qualitymatter s . org/index . htm 
Sloan Consortium 

The Sloan Consortium (Sloan-C) is a consor- 
tium of institutions and organizations commit- 
ted to quality online education. It aims to help 
learning organizations improve the quality of 
their programming, and has a report identifying 
five “pillars” of quality higher education online 
programs: learning effectiveness, student satis- 
faction, faculty satisfaction, cost effectiveness, 
and access. Sloan-C also has a Web site that 
collects information about best practices within 
each of these areas, 
http : //www. sloan-c.org 

These pages identify resources created and 
maintained by other public and private orga- 
nizations. Tins information is provided for the 
reader’s convenience. Tloe U.S. Department of 
Education is not responsible for controlling or 
guaranteeing the accuracy, relevance, timeli- 
ness, or completeness of this outside informa- 
tion. Further, the inclusion of these resources 
does not reflect their importance, nor is it in- 
tended to endorse any views expressed, or prod- 
ucts or services offered. 



E 



Evaluating Online Learning: Challenges and Strategies for Success 
INNOVATIONS IN EDUCATION 




■IB I [;■■■!■ HI II IB ■III IB 

APPENDIX B 



Met 



hoaoogy 



The research approach underlying this guide is 
a combination of case study methodology and 
benchmarking of best practices. Used in busi- 
nesses worldwide as they seek to continuously 
improve their operations, benchmarking has 
more recently been applied to education for 
identifying promising practices. Benchmarking 
is a structured, efficient process that targets key 
operations and identifies promising practices 
in relationship to traditional practice, previous 
practice at the selected sites (lessons learned), 
and local outcome data. The methodology is 
further explained in a background document, 18 
which lays out the justification for identifying 
promising practices based on four sources of 
rigor in the approach: 

• Theory and research base 

• Expert review 

• Site evidence of effectiveness 

• Systematic field research and cross-site- 
analysis 

The steps of the research process were: defining 
a study scope; seeking input from experts to re- 
fine the scope and inform site selection criteria; 
screening potential sites; selecting sites to study; 
conducting site interviews, visits, or both; collect- 
ing and analyzing data to write case reports; and 
writing a user-friendly guide. 

Site Selection Criteria and Process 

In this guide, the term “online learning program” 
is used to describe a range of education programs 
and settings in the K-12 arena, including distance 
learning courses offered by universities, private 
providers, or teachers at other schools; stand-alone 



“virtual schools” that provide students with a full 
range of online courses and services; and Web 
portals that provide teachers, parents, and students 
with a variety of online tools and supplementary 
education materials. As a first step in the study 
underlying this guide, researchers compiled a list 
of evaluations of K-12 online programs that had 
been conducted by external evaluators, research 
organizations, foundations, and program leaders. 
This initial list, compiled via Web and document 
searches, was expanded through referrals from a 
six-member advisory group and other knowledge- 
able experts in the field. Forty organizations and 
programs were on the final list for consideration. 

A matrix of selection criteria was drafted and 
revised based on feedback from advisors. The 
three quality criteria were: 

• The evaluation included multiple outcome 
measures, including student achievement. 

• The findings from the evaluation were widely 
communicated to key stakeholders of the pro- 
gram being studied. 

• Program leaders acted on evaluation results. 

Researchers then rated each potential site on 
these three criteria, using publicly available in- 
formation, review of evaluation reports, and gap- 
filling interviews with program leaders. All the 
included sites scored at least six of the possible 
nine points across these three criteria. 

Because the goal of the publication was to 
showcase a variety of types of evaluations, the 
potential sites were coded as to such additional 
characteristics as internal vs. external evalua- 
tor, type of evaluation design, type of online 
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learning program, organization unit: district or 
state, and stage of maturity. The final selection 
was made to draw from as wide a range on 
these characteristics as possible while keeping 
the quality criteria high, as described above. 

Data Collection 

Data were collected through a combination of 
on-site and virtual visits. Because the program 
sites themselves were not brick-and-mortar, 
phone interviews were generally sufficient and 
cost-effective. But some site visits were con- 
ducted face-to-face to ensure access to all avail- 
able information. Semistructured interviews were 
conducted with program leaders, other key pro- 
gram staff, and evaluators. Key interviews were 
tape recorded to ensure lively descriptions and 
quotes using natural language. While conducting 
the case studies, staff also obtained copies of lo- 
cal documents, including evaluation reports and 
plans documenting use of evaluation findings. 

Program leaders and evaluators were asked to: 

• Describe the rationale behind the evaluation 
and, if applicable, the criteria for choosing an 
external evaluator; 

• Explain the challenges and obstacles that were 
faced throughout the evaluation process, and 
how they were addressed; 

• Tell how the study design was affected by 
available resources; 

• If the evaluation was conducted externally, 
describe the relationship between the program 
and contractor; 

• Provide the framework used to design and 
implement the evaluation; 

• Tell how the appropriate measures or indica- 
tors were established; 

• Explain how the indicators are aligned to lo- 
cal, state, and/or national standards, as well as 
program goals; 

• Describe the data collection tools; 

• Explain the methods used for managing and 
securing data; 



• Describe how data were interpreted and re- 
ported; and 

• Share improvements made in program services 
and the evaluation process. 

Analysis and Reporting 

A case report was written about each program and 
its evaluation and reviewed by program leaders 
and evaluators for accuracy. Drawing from these 
case reports, program and evaluation documen- 
tation, and interview transcripts, the project team 
identified common themes about the challenges 
faced over the course of the evaluations and what 
contributed to the success of the evaluations. This 
cross-site analysis built on both the research lit- 
erature and on emerging patterns in the data. 

This descriptive research process suggests prom- 
ising practices — ways to do things that others 
have found helpful, lessons they have learned — 
and offers practical how-to guidance. This is not 
the kind of experimental research that can yield 
valid causal claims about what works. Readers 
should judge for themselves the merits of these 
practices, based on their understanding of why 
they should work, how they fit the local context, 
and what happens when they actually try them. 
Also, readers should understand that these de- 
scriptions do not constitute an endorsement of 
specific practices or products. 

Using the Guide 

Ultimately, readers of this guide will need to se- 
lect, adapt, and implement practices that meet 
their individual needs and contexts. Evaluators 
of online programs, whether internal or external, 
may continue to study the issues identified in this 
guide, using the ideas and practices and, indeed, 
the challenges, from these program evaluations 
as a springboard for further discussion and explo- 
ration. In this way, a pool of promising practices 
will grow, and program leaders and evaluators 
alike can work together toward finding increas- 
ingly effective approaches to evaluating online 
learning programs. 
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Control group refers to the population of subjects 
(e.g., students) who do not receive or participate 
in the treatment being studied (e.g., an online 
class) but whose performance or other outcomes 
are being compared to those of students who 
will receive or participate in the treatment. 

Formative evaluations generate information 
aimed at helping program stakeholders better 
understand a program or its participants, often 
by examining the delivery or implementation of 
a program. Findings from these evaluations are 
generally used to make program improvements 
or influence future decisions. 

Hierarchical linear modeling, also called 
multi-level modeling, is used for the same pur- 
pose as regression analysis — to understand what 
factors are the best predictors of an outcome, 
such as a test score. But researchers use hierar- 
chical linear modeling to take into account fac- 
tors at different levels of an education system, 
such as the characteristics of the class or school 
in which students are situated. Hierarchical lin- 
ear modeling helps statisticians address the fact 
that students are generally not grouped ran- 
domly within classes or schools and that class- 
room- and school-level factors are often related 
to student outcomes. 

Quasi-experiments are experimental studies 
in which subjects are not assigned at random 



to treatment and control groups, as with RCTs 
(see above). Quasi-experimental studies may be 
used, for example, when controlled trials are 
infeasible (e.g., when evaluators cannot assign 
students randomly to participate in a treatment) 
or are considered too expensive. According to 
the U.S. Department of Education’s What Works 
Clearinghouse, strong evidence of a program’s 
effectiveness can be obtained from a quasi- 
experimental study based on one of three de- 
signs: one that “equates” treatment and control 
groups, either by matching groups based on key 
characteristics of participants or by using statisti- 
cal methods to account for differences between 
groups; one that employs a discontinuity design 
in which participants are assigned to the treat- 
ment and control groups based on a cutoff score 
on a pretreatment measure that typically assess- 
es need or merit; or one that uses a “single-case 
design” involving repeated measurement of a 
single subject (e.g., a student or a classroom) in 
different conditions or phases over time . 19 

Randomized controlled trials (RCTs) are ex- 
perimental studies that randomly assign some 
study participants to receive a treatment (e.g., 
participation in a class or program) and oth- 
ers to not receive the treatment. This latter is 
known as the control group. In an RCT, evalu- 
ators compare the outcomes (e.g., test scores) 
of the treatment group with those of the control 
group; these results are used to determine the 
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effectiveness of the treatment. RCTs can provide 
strong evidence of a program’s effectiveness. 

Regression analysis is a statistical technique 
used in research to determine the factors or 
characteristics (e.g., gender, family income lev- 
el, whether a student participated in a particular 
program) that are the best predictors of an out- 
come. Regression analyses help statisticians iso- 
late the relationships between individual factors 
and an outcome and, thus, are useful when try- 



ing to understand the relationship of a program 
to student achievement. 

Summative evaluations examine the effects 
or outcomes of a program. Findings from these 
evaluations are generally used to assess how 
well a program is meeting its stated goals. 

Treatment group refers to the population of 
subjects (in this case, students) who receive or 
participate in the treatment being studied (e.g., 
an online class). 
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